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Genetic regulatory networks enable cells to respond to the changes in internal and external con- 
ditions by dynamically coordinating their gene expression profiles. Our ability to make quantitative 
measurements in these biochemical circuits has deepened our understanding of what kinds of com- 
putations genetic regulatory networks can perform and with what reliability. These advances have 
motivated researchers to look for connections between the architecture and function of genetic reg- 
ulatory networks. Transmitting information between network's inputs and its outputs has been 
proposed as one such possible measure of function, relevant in certain biological contexts. Here 
we summarize recent developments in the application of information theory to gene regulatory net- 
works. We first review basic concepts in information theory necessary to understand recent work. 
We then discuss the functional complexity of gene regulation which arrises from the molecular na- 
ture of the regulatory interactions. We end by reviewing some experiments supporting the view that 
genetic networks responsible for early development of multicellular organisms might be maximizing 
transmitted "positional" information. 
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I. INTRODUCTION 

In the classical view of genetics, the information neces- 
sary for the functioning of a given organism is encoded in 
its DNA [U [2] . Gene expression is a process by which this 
information is extracted from the DNA in order to syn- 
thesize proteins that carry out specific functions in the 
cell. For instance, actin and tubulin provide structural 
support, myosin can generate physical forces, kinases and 
phosphatases are instrumental in intracellular signaling 
pathways, substrate-specific enzymes drive the metabolic 
cycle, and, ultimately, gene expression machinery itself 
needs to be synthesized from its DNA blueprint. Accord- 
ing to the central dogma, information flows from DNA 
to proteins: first the genes on DNA are transcribed into 
mRNA, which is converted by the ribosomes into amino 
acid sequences that fold into functioning proteins. It is 
quite obvious, however, that information must flow in 
the other direction as well, dictating under what condi- 
tions which proteins should be produced from their DNA 
blueprints. The best example of this are multicellular 
organisms: although all of their cells share the same ge- 
nomic DNA, they do not all express the same proteins, 
and it is this selective gene expression that allows the 
cells to specialize into different phenotypes, build up a 
range of tissues, and fulfill specific organismal functions. 

All cellular processes which control the expression of 
proteins are collectively called gene regulation. Gene reg- 
ulation can occur at essentially every step of extracting 
the information from the DNA: at the level of DNA pack- 
ing and epigenetic modifications, at transcription initia- 
tion, translation, through modifications of mRNAs, or 
through post-translational modifications of amino-acid 
sequences [3] . These processes are mainly effected by spe- 
cial proteins with regulatory function, among which we 
single out as a prominent example transcription factors 
that can modify the transcription activity at their target 
genes. At any moment, the state of a living organism is 
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thus not described by its genome alone, but also by the 
set of (regulatory) genes that the organism actually ex- 
presses and the concentration levels of the corresponding 
gene products. 

The possible phenotypic states of a cell correspond to 
distinct gene expression patterns. In this view, DNA and 
the associated regulation machinery give rise to a finite 
yet large number of possible cellular "outcome" states, 
while the actual state is selected from this possible range 
both by current internal and environmental conditions. 
Despite recent experimental and theoretical progress in 
characterizing molecular properties of various regulatory 
subunits and specific molecular pathways, we do not fully 
understand how these elements come together to form a 
functioning system and how precisely they fit into the 
conceptual picture outlined above. Classic genetic ex- 
periments on model systems, as well as bioinformatics in 
conjunction with high-throughput assays, have started 
to fill out the map of regulatory interactions in the cell, 
i.e. which transcription factor proteins regulate which 
genes, which pathways are activated under given condi- 
tions, and what is the role of non-transcriptional regula- 
tory mechanisms. What we are learning in terms of de- 
tail, however, is opening up new questions on the systems 
level: in trying to understand the experimentally recon- 
structed regulatory networks, we find them statistically 
far from random, but also far from how human engineers 
would go about solving the problems that cells are pre- 
sumably trying to solve. Our difficulty in understanding 
and reverse-engineering these networks gives rise to sev- 
eral important questions: Why do regulatory networks 
have the observed architectures? Are we correctly and 
quantitatively understanding the functions that they per- 
form? Can we look for factors that discriminate the net- 
works that exist in nature as opposed to the ones that 
do not? Are existing networks simply artifacts of evo- 
lutionary history, or are there features that discriminate 
them from the non-existent but in principle possible ar- 
chitectures? Are observed forms of gene regulation all 
necessary in different contexts or are they simply redun- 
dant? Can we go beyond mere characterizations of reg- 
ulatory networks and instead identify physical principles 
that govern the observed network behaviors? 

A number of groups have recently explored different 
physical principles that could influence the parameter 
regimes and modes of regulation in living organisms 
[TU] . Such approaches usually require one to choose a 
measure of network function. Among options being con- 
sidered are minimization of biochemical noise [TJ , op- 
timization of losses in case of unknown enviromental sig- 
nals [5] , maximization of positional information I12j , 
or optimization of resources [T^ • Some of these strategies 
have also been considered in the presence of evolution- 
ary forces [HI [H]. Here, we review the work that has 
focussed on optimizing information transmission in gene 
regulatory networks. 

Assuming that information transmission is a viable 
measure of network function, we can explore and compare 



various network architectures and modes of gene regula- 
tion. We note upfront that the assumption we are making 
is a strong one and in general gene regulatory networks 
need not be optimized at all; nevertheless, we claim that 
(i) in certain biological contexts this assumption might 
be close to valid; (ii) it will enable us to make some 
analytic progress; (iii) even if not fully correct, such as- 
sumption allows us to make experimentally testable pre- 



dictions. We argue these points in detail in Section pi C 



We start by introducing a mathematical framework in 
which gene regulation can be described (Section [TT|. We 



then formalize the concept of information (Section III A 
Section 



III B I, proceed to review optimal networks in the 
limit of small noise (Sect ion |IV A Section IV B ) and be- 



yond this limit (Section IV C[ ). Lastly we discuss infor- 
mation transmission in the presence of time-dependent 
signals (Section IV D[ ) . 

This review is primarily aimed at a physics readership, 
but we hope it can be enjoyed by anyone with an in- 
terest in the interface between information theory and 
gene regulation. We methodologically introduce both 
topics, neither of which is typically discussed in the tra- 
ditional physics curriculum. All results presented in this 
review have been published elsewhere; parts of this re- 
view follow the exposition of Ref [13] which discusses the 
links between statistical physics and biological networks 
in greater detail. To keep bibliography manageable, we 
decided to reference solely standard textbooks, papers 
that directly discuss signal transmission in biological net- 
works, and experimental papers that we provide as ex- 
amples in this manuscript. We do not provide extensive 
referencing for large and relevant fields discussing noise in 
gene expression, statistical and dynamical properties of 
regulatory networks in general, or papers providing bio- 
logical detail on early embryonic development; interested 
readers should consult Ref [TS] and references therein. 



II. FUNCTIONAL ASPECTS OF GENE 
REGULATION 

The expression of genes in cells is controlled mainly by 
binding and unbinding of regulatory proteins, called tran- 
scription factors (TFs), to specific short DNA sequences, 
called binding sites [16] . These regulatory proteins can 
act either as activators, which means they increase the 
rate of expression of the genes, or as repressors that de- 
crease the rate of expression of the regulated genes. The 
genetic sequence of the DNA is transcribed into mRNA 
by a holoenzyme called RNA polymerase. Activators of- 
ten act by recruiting the polymerase, whereas repressors 
often act by sterically blocking the polymerase from bind- 
ing. Ribosomes translate mRNA strands into proteins. 
TFs can cross- and self-regulate, opening up a possibility 
of feedback regulation. They are usually present in nuclei 
in small, nanomolar range concentrations (for a nucleus 
with several /im radius, these concentrations correspond 
to several hundred to thousands of TP molecules per nu- 
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cleus). The timescales of such regulation span a wide 
range, from minutes to hours. 

Generally, the expression of genes can be regulated at 
all levels, from DNA looping to post-translational modi- 
fication of proteins. Often, many co-factors and enzymes 
are involved, and the process can be described in a molec- 
ularly detailed fashion. However, certain features can be 
abstracted and allow us to study generalized models of 
gene expression: 

• Regulation functions, i.e. functions that map the 
concentrations of TFs into levels of regulated gene 
expression, in gene regulatory networks are non- 
linear. There are saturation effects, for example 
when a gene is fully activated. Nonlinearities in 
regulation also set the range of input concentra- 
tions in which a network is responsive. In addition 
to the simple nonlinearities induced by saturation 
effects, networks often contain positive or negative 
feedback loops that can give rise to even more 
complicated behaviors. 

• Gene regulation is a noisy process. This is a 
consequence of the stochasticity in single molec- 
ular events at low concentrations of the relevant 
molecules, such as in reactions between TFs and 
binding sites (that can be present at copy num- 
bers of only one or two in the whole genome). The 
nanomolar concentrations of TFs in the cell mean 
that the precise timing when a TF finds and binds 
a regulatory site on the DNA is a random variable; 
this randomness results in stochastic gene activa- 
tion. 

• The processes involved in gene regulation happen 
on various time scales: the time on which the in- 
put fluctuates, the protein decay time, the gene ex- 
pression state fluctuation time, the time on which 
the external input signal changes. The networks 
are dynamical systems, and their behaviors span 
the range from settling down to one of the possi- 
ble stationary states, to generating intrinsic oscilla- 
tions (as in, e.g., circadian clocks) or more complex 
combinations of checkpoint steady states and limit 
cycle oscillations (as in cell-cycle control). 

• The wiring in the network is specific. Specificity 
is achieved by molecular mechanisms of recognition 
(TF-DNA interaction) . One TF can regulate many 
genes by recognizing and binding multiple sites in 
the genome, and each gene can be regulated by 
several TFs. 

One can describe a gene regulatory circuit at various 
levels of detail. All of them attempt to capture most of 
the properties listed above, with different emphasis on 
the particular points (see Refs [15l [17] for more informa- 
tion). Here wc will briefly review a few basic approaches 
that we are going to use later in this review, on a specific 
example of a single regulatory element. 
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FIG. 1: The simplest regulatory graph, where an input tran- 
scription factor at concentration c regulates the output ex- 
pression level g by binding to a binding site n, which can be 
empty or occupied. Since c acts as an activator, an occupied 
site results in transcription and translation of g. 

A. Gene regulatory elements: a mathematical 
primer 

Let transcription factors be present at concentration c 
in the cell. On the DNA, there is a single specific binding 
site that can be occupied or empty; we will denote this 
occupancy with n{t). When the site is occupied, the 
regulated gene will get transcribed into mRNA, which is 
later translated into proteins whose count we denote by 
g{t), at the combined rate that we denote by R. The 
proteins are degraded with the characteristic time r. In 
this case, our TF thus acts as an activator, see Fig. [T] 
Here and afterwards we will refer to the transcription 
factor c as an input, and the regulated gene product g as 
output. 

This model discards a lot of molecular complexity: 
there is no explicit treatment of diffusion of TFs, no 
non-specific binding, no separate treatment of mRNA 
and protein, no chromatin opening / closing etc; in ad- 
dition, we group many multi-stage molecular processes 
(such as TF binding, RNAP assembly, processive tran- 
scription etc) into single coarse-grained steps. Thus, our 
model is a gross (but tractable) oversimplification. As an 
illustration, let us formulate it in a few different mathe- 
matical frameworks. 

In the limit of relatively large concentrations, we can 
treat concentrations c and g as continuous and describe 
this regulatory process by the set of differential equations 
for the means of the concentrations: 

5 = fc+cW(l-n)-fc_n (1) 



Equation ([T]) is an equation for occupancy n, which is a 
number between and 1. Nominally, the site can only 
be fully empty or occupied, but in this approximation, 
we treat it as a continuous variable that can be inter- 
preted as a "probability of the site being bound." fc+c is 
the TF-concentration-dependent on-rate, and k- is the 
first-order off-rate. Often, it is assumed that there is a 
separation of time scales: the first equation for occupancy 
equilibrates much faster than r, meaning that the mean 
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occupancy 



n{t) 



k+c{t) 



k+c(t) + k^ 
can be inserted into Eq ^ to get 



— = --git) + i?_^±^^i^ 
dt T fc_|_c(i) + fc_ 



(3) 



(4) 



In this simple case without feedback, the approach to the 
equihbrium at fixed c is exponential with the rate r, and 
the steady state is simple: g = Rrn. The effective pro- 
duction rate Rn in Eq Q is a function with a sigmoidal 
shape. We discuss in the next section how the particular 
sigmoidal regulation functions are connected to equilib- 
rium statistical mechanics of this system, how noise can 
be added by an introduction of the Langevin force into 
Eq ([2]), and why the assumption of fast equilibration of 
n strongly influences the noise. 

Suppose we wanted to capture the idea that the num- 
ber of molecules in the system is discrete and that reac- 
tions between them are stochastic. In this case the object 
of our inquiry would be Pn{g\t,c): the time-dependent 
joint probability of observing g molecules of the result- 
ing gene and the state of the binding site being n = 0, 1 
(empty, occupied), given some concentration of the in- 
put c. One can marginalize this distribution over n to 
get the evolution of probability of observing g output 
molecules: P{g\t,c) = X]n=o 1 ^"(sl^' "^)- Writing down 
the master equation jl8[ I19| and for simplicity suppress- 
ing the parameters (c, t) on which all terms P„(g|t, c) are 
conditioned, we find: 



dPojg) 

dt 
dPijg) 

dt 



^Po(.g + 1) + k^Piig) - {k+c+^)Poig) 

T T 



1 



Pi (5 + 1) + RPiig - 1) + k+cPo{g) - 



{k_ + ^ + R)P,{g); 

T 



(5) 



the reader should recognize degradation-related terms 
(proportional to 1 /r) , the protein production terms (pre- 
fixed with R and present only in the case when the gene 
is on, i.e. n — 1) and the switching terms of the promoter 
containing k^c and fc_, which couple the n = to n = 1 
states. In this simple case, the equilibrium distribution 
can be solved by zeroing out the left-hand side of Eqs ^ . 
This yields an infinite dimensional system in g that can 
be truncated at some gmax 3> Rt; we would end up with a 
homogenous linear system that can be supplemented by a 
normalization condition J2n=o 1 Sg'^cT ^nig) — 1, which 
can be inverted and solved for steady state Pn{g)- More 
sophisticated methods are available when the number of 
genes grows and they are interacting [20]. Note that in 
this example we treated g as discrete, but c is still a con- 
tinuous input parameter (not a variable whose distribu- 
tions we are also interested in). We can directly calculate 
the moments {g^) = Ylg n9^Pn{g) from the steady state 



master equations. If we define n = Y^gPi{g), we repro- 
duce (with fc = 1) the equations for the averages g{t), 
n (t) in Eqs ( [Tpl ). 

One can also expand the master equation to second or- 
der. If we assume that the gene expression state changes 
on fast timescales compared to the change in the num- 
ber of proteins, we obtain the Fokker-Planck equation for 

Pig) = Poig) + Piig)- 



dPjg) 

dt 



d_ 

dg 



(R 



k+c(t) 



l(P_ 



k+c{t) + fc_ 
k+c{t) 



-9/r)P{g) 



(6) 



[R 



k+c{t) + k. 



-9MP{g) 



Equation Q can be recovered again by calculating the 
mean of g from the Fokker-Planck equations. Both the 
master Eqs ([s]) and Fokker-Planck Eq Q allow us to cal- 
culate higher order moments apart from the mean; for in- 
stance, by computing the second moments, we can write 
down the fluctuation of the number of proteins around 
its mean, o-g{t, c) = ((7^) — (g)^. This is intrinsic noise, or 
stochasticity in gene expression due to the randomness 
and discrete nature of molecular interactions. 

If we assume a priori that a gene regulatory process 
can be described well in terms of the dynamics for the 
mean values [as in Eq (l]2)] plus a Gaussian fluctuation 
around the mean (ignoring higher order moments), there 
exists a systematic procedure for calculating the resulting 
noise variances, called the Langevin approximation. In 
this approximation one starts by writing down ordinary 
differential equations for the mean values, and adds an 
ad hoc noise force, 

dq 1 , X „ k+c(t) , , , 

The "random force" term ^ takes a form that we need 
to assume based on physical intuition and more formal 
methods (e.g. the Fokker-Planck equation). In this case 
we can postulate that the noise magnitude T depends on 
the state of the system, but that fluctuations are zero 
mean, {S,(t)) = 0, random and uncorrelated in time, i.e. 
(^(t)^(i')) = 2T{g)S{t — t') (braces denote averaging over 
many realizations of the noise time series). We'll dis- 
cuss noise in gene expression in detail later, including a 
worked-out example using Langevin approximation. For 
a detailed derivation of various approximations in gene 
regulation see Ref [17 . 

Finally, let us mention the numerical Gillespie algo- 
rithm |21j . For this algorithm we start with enumerating 
all reactions i and their rates r^: 



''1 



:k+ 
- fc_ 

R 



c + n ^ cn 
cn ^ c + n 
cn ^ cn + g 
.9^0 



(8) 



The state of the system is then initialized as a vector 
(c, n, cn, g) of integer counts of molecular species (here 
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cn denotes a molecular complex of a c molecule bound 
to the promoter; there can only be or 1 n and cn, and 
one can quickly check that rt = 1 — cn). Then the prob- 
ability per unit time of each of the 4 reactions is the 
product of the rate constant and the number of reac- 
tants properly normalized by the relevant volume. The 
algorithm randomly draws the next reaction consistent 
with the probabilities per unit time, updates the state 
of the system and repeats. This algorithm is exact for 
well-mixed systems, but (i) it can be slow in case there 
are fast and slow reactions in the system; (ii) one needs 
to sample many simulation runs to accumulate the noise 
statistics; (iii) it can become incorrect in biological sys- 
tems where transport (e.g. diffusion) needs to be taken 
into account explicitly 

From the presented example it is clear that the fully 
stochastic dynamical description can be relatively com- 
plicated even for a very simple system. To proceed and 
be able to connect to data, we will drop the time depen- 
dence and only focus on the steady state, while empha- 
sizing the nonlinear and noisy nature of the system. Our 
assumption to only study the steady state will preclude 
us from discussing network phenomena that are intrinsi- 
cally dynamic, e.g. the cell cycle or the circadian clock. 
But for many biologically realistic cases, such as in de- 
velopmental biology, or in many experimental settings, 
such as measuring the gene response to constant levels of 
inducer, the steady state approach is useful. 



B. Regulation by a single transcription factor 

In this section we will explore simple thermodynamical 
models of gene regulation, by studying how the concen- 
tration of a transcription factor relates to promotor oc- 
cupancy and thus to the expression level of the regulated 
gene. A detailed discussion of the thermodynamic ap- 
proach to gene regulation with worked out examples for 
various regulatory strategies can be found in Rcfs [25, 26J. 

In the previous section we saw that we can obtain 
the expression for the mean promoter occupancy directly 
from the master equation, assuming that the system is in 
equilibrium. Under this assumption we can ask for the 
equivalent statistical mechanics description which, as we 
shall see, can be easily generalized to larger systems. 

Suppose we have a site n that can be occupied or 
empty. In case it is occupied, there is a binding energy 
E favoring the occupied state, relative to the reference 
energy in the unbound state. But in order to occupy 
the state, one needs to remove one molecule of TF from 
the solution. The chemical potential of TFs, or the free 
energy cost of removing a single molecule of TF from the 
solution, is = fc^riogc, where c is the TF concen- 
tration measured in some dimensionless units of choice. 
In statistical physics we can calculate every equilibrium 
property of the system if we know how to compute the 
partition sum, which is Z = e~''^^'~'^"'\ where the 
sum is taken over all possible states of the system (in our 



case binding site empty and binding site occupied), Ei 
is the energy of the system is the state i, and rii is the 
number of molecules in the system in the state i. 

In our case of a single binding site, the partition sum 
is taken over the empty {n — 0) and occupied (n — 1) 
states: 



(9) 



where /3 = l/{kBT), T is the temperature in Kelvin and 
fcs is the Boltzmann constant. The probability that the 
site is occupied is then 



Inserting the definition of /i, we get 

P(n = 1) = 



(10) 



(11) 



where we write — explpE). But n = 1 ■ P{n = 
1) -I- • P{n = 0) = P{n = 1), so by comparing with 
Eq ([T]) we can make the identification 



(12) 



which connects our statistical mechanics and dynamical 
pictures. Note that fc_ is measured in units of inverse 
time, s~^, is measured in units of s^^ x [conc]~^ 
(but by convention we here measure concentration in di- 
mensionless units, as in = fesTlogc), so Kd has units 
of concentration. 

Suppose we make the model somewhat more compli- 
cated: let us have two binding sites, which together will 
constitute a system with 4 possible states of occupancy: 
both sites empty, either one occupied, and both occupied, 
which we will write compactly as (00, 01, 10, 11). Let us 
also assume that there is cooperativity in the system ~ if 
both sites are occupied, then there will be an additional 
favorable energetic contribution of e to the total energy 
of the state (11). Finally, when promoters can have mul- 
tiple internal states, we need to decide which state is the 
"active" state, when the gene is being transcribed [77]: 
here we pick the state (11) as the active state. 

The probability of being active is then 



P(ll) 



-2E- 



-2E- 



(13) 



where we use the units where 13 = 1, that is, we express 
the energies and chemical potential in thermal units of 
ksT. If the cooperativity is strong, i.e. the additional 
gain in energy e is larger than the favorable energy of 
putting a molecule of TF out of the solution onto the 
binding site, e <C /i — -E, we can drop the middle term of 



the denominator in Eq (13) and simplify it into 



P(ll) 



(14) 
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FIG. 2: The transitions in the model with 2 binding sites 
and 4 occupancy states, (00,01,10,11). The binding of an 
additional molecule of TF happens at a rate fe+c, whereas 
unbinding rates are state dependent: a singly occupied pro- 
moter returns to the non-occupied state with a rate fc_ , but 
the doubly occupied promoter loses a molecule of TF with the 
rate k_. This difference is due to cooperativity, where the 
binding of one molecule stabilizes the binding of the other, 
and this makes the unbinding rates state dependent. The 
"active" state is (11) in the lower right corner. We leave 
it as an exercise for the reader to write down the dynami- 
cal equations dnoo/dt — . ■ ., dnoi/dt = . . . etc, observe that 
noo + noi -|- nio + nu = 1, and compute the steady state ac- 
tivation if cooperativity is strong, nu. As in the case of a 
single binding site, this expression can be connected to the 
thermodynamic result of Eq (1131). 



with Kd — exp[/3(i?-f e/2)], where again we have used the 
definition of chemical potential /i. This problem with 2 
binding sites and 4 states of occupancy also has a comple- 
mentary dynamical picture, which is already quite com- 
plicated, see Fig. |2] We also note that the same behavior 
for occupancy given by Eq ( 14 ) can be derived directly 



from a master equation, assuming that the binding of 
dimers is necessary to activate the gene {k^c^Po^g) in- 
stead of k+cPo{g) in Eq ([5])). 

Readers used to molecular biology models of gene reg- 
ulation will recognize sigmoidal functions in Eqs ( 11|14 1 , 
also known as Hill functions, with a general form (see 
Fig.|3]): 



n(c) 



(15) 



where the dissociation constant Kd is interpreted as the 
concentration at which the promoter is half induced, and 
h is known as the cooperativity or Hill coefficient, usually 
interpreted as the "number of binding sites" [75] . Here we 
have shown how such phenomenological curves arise from 
simple statistical mechanics models of gene regulation 
with cooperative interactions. For repressors, one can 
show that n{c) = K^^/ic^ + Kj^)- 




^ h=1 K^=1 



h=3 K^=1 

d 



10' 



FIG. 3: Three Hill regulatory functions with different slopes 
(Hill coefficients h), as in the legend. All functions have Kd = 
1. Input TF concentration is customarily plotted on logarith- 
mic horizontal axis, while the average promotor occupancy n 
is on the vertical axis. The output gene expression is in steady 
state g — {RT)n{c), i.e. proportional to occupancy. The slope 
of n(c) on the log-log plot at half-induction (c = Kd) is related 
to the Hill coefBcient, d(logn)/ci(log c)jK^ = h/2. 



Before proceeding, let us inspect more closely the re- 
lation between the dynamical rates and the binding en- 
ergy for a single site: fc_/fc+ = exp{(3E). As we have 
shown in Eq ( 12 1, this equality is required by detailed bal- 



ance if thermodynamic and kinetic pictures are to match. 
Molecularly, the energy of binding E in the case of tran- 
scription factor - DNA interaction depends on the DNA 
sequence. So if we were to vary the sequence and binding 
energy E would change, which of the two rates, fc_ or 
would vary as a result? In general one cannot answer this 
question without knowing in detail the sequence of molec- 
ular transitions that happen at the binding site. How- 
ever, there is a useful limit, called the diffusion-limited 
on-rate, that is often applicable. In this regime, the limit 
to how quickly a TF molecule can bind is given by the 
speed at which it can diffuse to the binding site. It has 
been shown that if a TF diffuses with diffusion constant 
D and is trying to bind a site with linear dimension a, 
the fastest on-rate is k^ w AnDa, for spherical TF and 
binding site [75] [57] • In the diffusion- limited approach, if 
the binding site is empty, as soon as a TF diffuses into 
a region of size a around the binding site, it will imme- 
diately bind. Then, all dependence on binding energy E 
will be absorbed into the off-rate Intuitively we can 
understand this by imagining that once the TF is bound 
in an energetically favorable configuration, it has to wait 
for a random thermal kick of typical size ksT to unbind. 
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and the probability of that kick being able to overcome 
the binding energy barrier £' is '-^ exjp{E /ksT). We will 
return to this limit in Section fll Dl 



C. Regulation by several transcription factors 

In the previous chapter we have shown how thermody- 
namic and kinetic models are connected for simple cases 
of gene regulation where a single transcription factor 
binds cooperatively to different numbers of binding sites. 
In many cases, however, several transcription factors to- 
gether regulate a single gene. How can such situations 
be addressed from a theoretical perspective? We will de- 
scribe two molecular frameworks for describing the joint 
regulation by two TFs. Both approaches can be easily 
generalized to more types of TFs. 

In the previous section we have motivated and derived 
Hill-type regulation functions. If we are considering a 
gene g regulated by two TFs, we need to be precise how 
these proteins act together, that is, we need to specify 
the "regulatory logic" of their interaction. For example, if 
gene g is activated by TF A, present at concentration c^, 
and repressed by TF B, present at concentration cb, one 
could postulate (without deriving) that the occupancy of 
the promoter is 



n{cA,CB) 



(16) 



This expression assumes that molecules of A bind inde- 
pendently (of B) to hA sites with dissociation constant 
Ka, and molecules of B bind to Hb sites with dissocia- 
tion constant Kg; importantly, we also assume that the 
joint regulation is and-like, meaning that gene g will only 
be activated when both A is bound and B is not bound 
[that's why there is a product in Eq (16l]. Conversely, in 



an alternative model the action of TF^ and TF B could 
be additive: 



nicA,CB) = Ci 



+ (i-Ci)^ 



(17) 



Ci is a number between [0,1], which balances the effect of 
both types of TFs on the expression of gene g. Another 
model might assume a combination of cooperative regu- 
lation given by Eq ( 16 1 and an additive model given by 
Eq (17 1. More complex schemes like this one can clearly 



be derived, and while they will not necessarily correspond 
to any possible thermodynamic system, they might be 
useful phenomenological models that can be fitted to the 
data. 

We can also pick a real thermodynamic model that 
is flexible enough to encompass many possible combina- 
torial strategies of gene regulation, while still having a 
small enough number of parameters to connect to avail- 
able data. As in the previous case, this model might not 
correspond on a molecular level to the events on the pro- 
moter, and would thus also qualify as a phenomenological 



model. It would, however, have the advantage of being 
more easily interpretable and understandable within the 
context of statistical physics. One such model is the so- 
called Monod-Wyman-Changeaux (MWC) model. 

The MWC can easily be extended to include combina- 
torial regulation. The model has been motivated by the 
work on allosteric transitions and was used to explain 
hemoglobin function |28]. When applied to the case of 
gene regulation, the central idea is that as a whole, the 
promoter can be in two states, "on" (1) and "off" (0). 
Remember that in our previous examples we had to de- 
clare one of the combinatorial states as the "active" state; 
here, this distinction is built into the model by assump- 
tion. See Ref for recent work that uses MWC to 
include the effect of nucleosomes on gene expression. 

The regulatory region has nA binding sites for tran- 
scription factor A. These sites can be bound in both 
the active and inactive state, and molecules of A always 
bind independently, see Fig. [4] However, the binding 
energy for each molecule of A to its binding site is state- 
dependent, i.e. E\ when the whole promoter is "off" vs 
E\ when it is "on." Let's work out the thermodynamics 
of this system. For each of the two states, we can write 
down the free energies of k molecules of type A bound: 



Fi === k{E\-^l), 



(18) 
(19) 



where /i = logc (we are writing everything in units of 
kgT and dimensionless concentration again), and L mea- 
sures how favoured the "off" state is against "on" state 
even with no TF molecules bound. The partition func- 
tion is then 



Z = 



^ \k 



fe=0 ^ 



(20) 



Recognizing that the sums are simply binomial 
expansions [80^, we get for the probability of the "on" 
state (proportional to the expression of the gene): 



P(on) 



(l + e~-^i+^)" 



(1 + e-^A+f^y^ + (1 + e"-^A+^)"e^ 

{l + c/K\r 
(l + c/i\:ji)" + L(l + c/XO)»' 



(21) 
(22) 



Equation ( |22[ ) is written in the standard form, with the 
identifications K\ — exp{/3E\), K\ = exp(/3iJ^) and 
L — exp(L). 

The regulatory impact of transcription factor A onto 
the regulated gene is described by quantities in, K\) 
in the MWC model. There is one additional param- 
eter L, the offset (or "leak") favoring the "off" state. 
Note that the parameters of the MWC model are 
not directly comparable to Hill model parameter K^; 
however, we can make the identification in the regime 
where c/K\ <C 1 and c/K\ ^ 1. Then the term 
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E^-n E^-^ O 



OFF 



output concentration of the gene, 5({c^}) oc P(on), be- 
ing a function of the concentration its TFs, {Cf^}. We can 
think of these functions as nonUnear input /output rela- 
tions, g = g{{c^}) that can be computed theoretically 
and, in many cases, mapped out experimentally [501 131) . 



E^-^i E\-\i o 



ON 



FIG. 4: A schematic diagram of MWC model. Two possi- 
ble states of the promoter, "on" and "off", are separated by 
an energy barrier of L. There are 3 binding sites for the 
transcription factor in this example, to which TFs bind in- 
dependently; their binding energy, however, depends on the 
state of the promoter. Here, 2 of the 3 sites are occupied, and 
are contributing E'^^ — fi each to the total free energy. If the 
promoter is "off," there is no transcription, if it is "on," tran- 
scription proceeds at rate R and gives rise to Rt molecules of 
output in steady state at full induction. 



(1- 
(1 



-c/if^)" in Eq (22) can be approximated with 1, and 
- c/K^y w (c/Xj)". Equation ^ then reduces to 



P(on) = 



c" + L{K\y 



(23) 



and we can identify the parameter n in the MWC model 
with the Hill coefficient h, and the dissociation constant 
of the Hill model, Kd, with Kd = L^/''K\. 

In general, for a single gene, the MWC model is not 
much different from Hill functions, producing sigmoidal 
curves that don't necessarily cover the whole range from 
to 1 in induction as the input changes over a wide range. 
However, in the limit where c/K^ 1, we can easily 
generalize MWC to regulation by several transcription 
factors. To see how, rewrite Eq ([2T|) as 



P(on) = 



1 



1 + e^^(=) ' 



(24) 



where F{c) = — nlog(l -I- c/K\) + L. In this picture, 
the binding and unbinding of transcription factors simply 
shifts the free energy of "on" vs "off" state. We can 
easily see that if K transcription factors fj, = A,B,... 
with concentrations regulate the expression of a gene, 
we can retain Eq ([24]), but write 



F{{c,}) ^ -J2n,\og + 



(25) 



it is easy to check that positive represent activating 
influences, while flipping the sign of makes that gene 
fi repress the expression of g [48] . 

To summarize, different functional models of gene reg- 
ulation presented in this section result in the steady-state 



D. Sources of noise in gene expression 

So far we have described several functional models for 
transcriptional regulation, and have shown how steady- 
state input/output relations, g = g{{c}), can be derived 
from kinetic and thermodynamic considerations. How- 
ever, as we mentioned in the Introduction, gene expres- 
sion is a stochastic process. What does this mean? In 
short, it means that the mean input /output relations are 
not a full description of the system. Given an input c, 
the output g will on average have the value g{c), but will 
dynamically fluctuate around this average. 

Alluding already to the terminology we are going to 
introduce more properly when discussing information 
transmission, we can view a genetic regulatory element as 
a "channel" that takes inputs c and maps them into out- 
puts g. When we say that there is noise in this mapping, 
we mean that for a single value of the input c, the output 
is not uniquely determined. Instead, there exists a dis- 
tribution over g, P{g\c), that tells us how likely we are to 
receive a particular g at the output if the symbol c was 
transmitted. This distribution, P{g\c), can be referred 
to as the conditional distribution of responses given the 
inputs. Once we know this distribution we can calculate 
(for continuous variables, such as concentrations), the 
mean response, g{c), and the spread around the mean, 
characterized by the variance cr?(c): 



(c) = / dggP{g\c), 



dg{g~gfP{g\c). 



(26) 
(27) 



These two functions are known as conditional mean and 
conditional variance, and they can easily be extracted 
from the distribution P(g|c), if it is known. A noise-free 
deterministic limit is recovered as crg(c) — > 0, in which 
case P{g\c) tends to a Dirac-delta distribution, P{g\c) = 

s{9-m)- 

Unfortunately, the full conditional distribution of re- 
sponses given the inputs, P{g\c), is usually only available 
in theoretical calculations or simulations, since in reality 
we rarely have enough data to sample it. In the case of 
gene regulation, sampling would involve changing the in- 
put concentration of TF, c, and for each input concentra- 
tion, measuring the full distribution of expression levels 
g. More often than not we only have enough samples to 
measure a few moments of the conditional output dis- 
tribution, perhaps the conditional mean and conditional 
variance. Given these measurements and P{g\c) that is 
experimentally inaccessible directly by sampling, we can 
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try making the approximation 



250 



^(5|c)«e(5;.g(c),a2(c)), 



(28) 



that is, we assume that P{g\c) is a Gaussian, with some 
input-dependent mean and variance. In the presented 
setting, the mean input / output response and the noise in 
the response cleanly separate: one is given by the condi- 
tional mean, and the other by conditional variance. The 
noise can be thought of as the fluctuations in the output 
variable while the input is held fixed. Recall that we are 
discussing all information processing systems in equilib- 
rium, that is, when the dynamics in g has reached steady 
state (and all variation in g at given c is due to noise). 
Having built these intuitions, let us see how noise can be 
derived in a simple model of gene regulation. 



E. Derivation of noise for simple gene regulation 

To start, we first return to the simple gene regulation 
scenario of Fig. [T] We will sketch how the noise can be 
derived in this model using the Langevin approximation, 
and give a back-of-the envelope estimate for the terms 
that we do not compute here. The reader is invited to 
view the full derivation in Ref |32j . 

We start with the dynamical equations: 



dn 

lit 
dg 

dt 



fc+c(l — n) — k^n + 



Rn g + ^g, 

T 



(29) 
(30) 



where again we take the binding site occupancy n to be 
between and 1, and the expression level of the out- 
put gene is g] g is produced with rate R when the bind- 
ing site is occupied, and the proteins have a lifetime of 
T. We have already shown that the equilibrium solution 
of this system is n = fc+c/(fc_|_c + fc_) and g = {RT)n. 
Here we are interested in the fluctuations, crg(c), around 
the steady state, that arise purely due to intrinsic noise 
sources: (i) the fact that the binding site only has two bi- 
nary states that switch on some characteristic timescale, 
(ii) the fact that we make a finite number of discrete 
proteins at the output, and (iii) the fact that the input 
concentration c might itself fluctuate at the binding site 
location. 

One approach would be to simulate the system of 
Eqs ( p9|30l ) exactly using the Gillespie SSA algorithm 
[21]. For a given and fixed level of input c, the results of 
20 such simulation runs are shown in Fig. [5] 

To compute this noise analytically instead of using the 
simulation, we have introduced random Langevin forces 
£,n, Cg- Consider the second equation, Eq (30 1. A sin- 
gle protein is produced anew, or is degradeoTas an ele- 
mentary step (since you don't make half a protein). In 
equilibrium, the production term Rn balances the degra- 
dation term, g/r. Now consider some time T in which 
RTfi = gT/r « 1, i.e. one molecule is produced or de- 
stroyed on average and with equal probability. While 
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FIG. 5: A fully stochastic simulation of a simple model of gene 
expression using reactions specified in Eq Q. The simulation 
starts with g{t — 0) =0 proteins; the steady state is reached 
after about 70 minutes. On the left, the trajectories of 20 
simulation runs. On the right, the mean trajectory plotted in 
a solid line; mean ± 1-std plotted in dashed lines. The enve- 
lope measures the steady state level of noise due to (i) random 
promoter switching and (ii) the shot noise in producing the 
output molecules. 



the expected change in the total number in equilibrium 
in time T is zero, the variance is not: the variance is 
equal to |x (production of 1 molecule)^ + (degrada- 
tion of 1 molecule)^ = 1. In general, the variance will be 
T{Rn + g/r) if we measure for time T . If you are famil- 
iar with random walks in ID, this sounds very familiar: 
the mean displacement is (because "leftwards steps" 
= steps that decrease protein copy number, and "right- 
wards steps" = steps that increase protein copy number, 
are equally likely) , but the variance in displacement from 
the origin grows with time T . 

Statistical physics tells us that in order to reproduce 
this variance in a dynamical system, we have to insert 
Langevin forces with the following prescription: 



im^ain = {Rn+-g/T)5{t^t'). 



(31) 



The mean random force is zero, it is uncorrelated in 
time, and it has an amplitude such that the random 
kicks have variance equal to the leftward and rightward 
step size; this will recover our intuition about ID random 
walks. We note that the Gaussian assumption holds for 
large copy numbers and that at very short timescales the 
assumption of temporally uncorrelated noise can break 
down. Similarly, (f„(t)f„(t')) = {k+c{l - n) + k^n)5{t - 
t'). 

To proceed, we first linearize Eqs ( 29|30 ) around the 
equilibrium, by writing n{t) = n + Sn{t), g{t) ~ g + Sg{t). 
Then we introduce Fourier transforms: 



Snit) 

Sgit) 



duj 
2^ 
duj 
2n 



Sri{uj)e-"^' 
5~g{Lo)e~^-K 



(32) 
(33) 



Fourier transforms of ^„ and are simply ^„ = 2k_n and 
= 2Rn, respectively (because the Fourier transform 
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of a delta-function is 1, and we have also used the fact 
that in equilibrium, the two terms that contribute to the 
magnitude of each Langevin force are equal). 

With this in mind, the system of equations in the 
Fourier space (denoted by tildes) now reads: 



iujSn = Sh + 

Tc 

-iujSg = Rdh ^g + Cg, 



(34) 
(35) 



where = (fc+c + fc_). 

We ultimately want to compute <Tg{c). The total vari- 
ance is composed from fluctuations at each frequency uj, 
integrated over frequencies .8 Ij : 



duj 
2^ 



did 

2^ 



Sg{uj), 



(36) 



where Sg{Lu) is called the noise power spectral density of 
.g, and the asterisk denotes complex conjug ate. W e see 
that we need to solve for 6g first from Eqs. ( 34|35 1: 



6~9 



-iuj + Tc ^){—iuj + T 



(37) 



Next, we compute (Sg {uj)6g* (uj)) . Recalling the defini- 
tions of (11*) [Eq mf], we find that 



The binding and unbinding of the promoter is usually 
much faster than the protein decay time, Tc <C t. Using 
this and the fact that dx{x'^ + 1)"^ ~ tt, we finally 
find 



a^^(c)=.g(c)+(^n(l-n)^ 



(39) 



If we normalize the expression level g such that it ranges 
between (no induction) to 1 (full induction) by defining 
g — g/{RT), then the noise in g is 



Rt 



5+^5(1-5)' 



(40) 



Our result is lacking at least one important contribu- 
tion to the total noise. The formal derivation of this 
missing term is involved [22 , so we will estimate it 
here up to a pref actor. In our derivation we have not 
taken into account that the molecules of transcription 
factor are brought to the binding site by diffusion. The 
diffusive arrival of molecules into a small volume around 
the binding site is a random process as well: it will in- 
duce some noise in occupancy of the binding site, and 
thus in the expression level g. This is the contribution 
we are going to estimate. 

Suppose that the binding site is fully contained in a 
physical box of side a. When the average TF concentra- 
tion in the nucleus is fixed at c, the average number of 



molecules in the box is iV = a'^c. This, however, is only 
the mean number; if we were to actually sample many 
times the number of molecules in the box, we would find 
that our counts are distributed in a Poisson fashion, with 
a variance equal to the mean: ct^ — N. This is again just 
the familiar shot noise, now appearing at the input side. 

How can one reduce the fluctuations cr^? As always, 
one can make more independent measurements, and aver- 
age the noise away. With M independent measurements, 
the effective noise should decrease, cr^ off = ^'n/^- Sup- 
pose the binding site measures for a time r (the protein 
lifetime, the longest time in the system). How many in- 
dependent measurements were made in the best possible 
case? It takes t^ = /D time for the molecules to dif- 
fuse out of the box of size a and be replaced with new 
molecules; if we take snapshots and count the molecules 
at intervals faster than to, we are not making independent 
measurements. Therefore M = r/io = tD/o?. Plug- 
ging this into the expression for effective noise, we find 
cr^j.jj = a^c X a? /{Dt). Since N — a^c, it follows that 
CT^ = a^cr^, and finally: 



:.cff 



Dot 



(41) 



Equation (41 1 is a fundamental result: any detector of 



linear size a measuring concentration c, to which ligands 
are transported by diffusion with coefficient D, and mak- 
ing measurements for time r, will suffer from the error 
in measurement in concentration, given by cTc. This con- 
tribution to the noise is called diffusive noise, and it is a 
special form of input noise. 

To assess how this input noise maps into the noise in 
the gene expression g, note that any (small) error at the 
input can be propagated to the output through the in- 
put/output relation, g{c) [see Fig [6]: 



(42) 



Adding the diffusive noise to previously computed terms 
in Eq (40), we find: 



DacT 



(43) 



Let us stop here with the derivation, interpret the 
terms and summarize what we have learned so far. We 
tried to compute various contributions to the noise in 
the expression of gene g, in a simple regulatory element 
where the TF c regulates g. In any real organism, such a 
small regulatory element will be embedded into the reg- 
ulatory network, and c will experience fluctuations on its 
own that will be transmitted into fluctuations in g, the 
so-called transmitted noise, in addition to intrinsic noise 
calculated here [5^3]. 

On top of intrinsic and transmitted noise sources, the 
output will also fluctuate due to the extrinsic noise be- 
cause the cellular environment of the regulatory network 
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local slope 



dg(c) 




input c 



FIG. 6: Propagating the noise in the input CTc, through the 
mean input/output relation, g{c), into the effective noise in 
the output, (Tg. The variances are related by the square of the 
local slope of the input/output curve, dg/dc. 



is not stable. But even without these complications, we 
can identify at least three contributions intrinsic to the 
c — >■ g regulatory process: 



Output noise. This is the first term in Eq (43 1, where 



the variance a~ oc g. Funamentally, this is a form of shot 
noise that arises because we produce a finite number of 
discrete output molecules. In the simple setting discussed 
here, the proportionality factor really is 1 [when g is mea- 
sured in counts, as in Eq (39)], and this is a true Poisson 
noise where variance is equal to the mean. If we treated 
the system more realistically, with separate transcription 
and translation steps, the proportionality constant could 
be different from 1; a more careful derivation shows that 
then, (t| = (1 + b)/{RT)g, where b is the burst size, or 
the number of proteins produced per single mRNA tran- 
script, on average j32|. This is easy to understand: the 
"rare" event is the transcription of a mRNA molecule, 
and that has true Poisson noise statistics, but for each 
single mRNA the system produces b proteins, and the 
variance is thus multiplied by b. 

Input promoter svi^itching noise. This is the sec- 



ond term in Eq ( 43 ) . The source of this noise is binomial 



switching of the promoter, as it can only be in an in- 
duced (n = 1) or empty (n = 0) states. If we interpret 
n as the probability of being occupied, then the variance 
must be binomial n(l — n). Fluctuations between empty 
and full states of occupancy happen with the timescale Tc 
[see Eq (34)], and the system averages for time r, so t/tc 
independent measurements are made, reducing the bino- 
mial variance to n(l — n)Tc/T. Since Tcfc_ = (1 — n) and 
n = g, we recover the switching term, g(l — g)"^ /{k^r). 

This term depends on the microscopic way the pro- 
moter is put together, hence the dependence on the ki- 
netic parameter k-. Regardless of these details, how- 
ever, every promoter that has an "on" and "off" state 
will experience fluctuations similar in form to these de- 
rived here. In our example, fc_ is the rate of TF unbind- 
ing from the binding site and this is usually assumed to 



be very fast compared to the protein lifetime (in other 
words, the occupancy of the promoter is equilibrated on 
the timescale of protein production) . In other scenarios 
that effectively induce gene switching, however, this as- 
sumption of fast equilibration might not be true. In par- 
ticular, attention has lately been devoted to DNA pack- 
ing and regulation via making the genes (in) accessible to 
transcription using chromatin modification. The packing 
/ unpacking mechanisms are thought to occur with slow 
rates, and such switching term might be an important 
contribution to the total noise in gene expression |34| . 

Input diffusion noise. The last term in Eq ( [43| , 
as discussed, captures the intuition that even with the 
fixed average concentration c in the nucleus (that is, 
even if c did not undergo any fluctuation relating to 
its own production, degradation and regulation), there 
would still be local fluctuations at its TF binding site lo- 
cation, causing noise in g. This contribution is important 
when c is present at low concentrations. As an exercise, 
one can consider the approximate relevance of this term 
in case of prokaryotic transcriptional regulation, where 
D ~ 1 iMT? / s, the size of the binding site a ^ 3 nm, 
the relevant TF concentrations are in nanomolar range, 
and the integration times in minutes. It has been shown 
that this kind of noise also represents a physical limit 
in the sense that it is independent of the molecular ma- 
chinery at the promoter, as long as the predominant TF 
transport mechanism is free diffusion. 

What we presented here theoretically was a simple ex- 
ample, but how does it relate to experiment? Figure [7] 
shows that our simple model incorporating only the out- 
put and the input diffusive noise contributions is an ex- 
cellent description of data from early fly development. 
The two fitted parameters give the magnitudes of the 
two respective noise sources, and their values match the 
values estimated from known parameters and concentra- 
tions (32] • We note that the prominent contribution of 
input noise seems to be a hallmark of noise in eukaryotic 
(but not prokaryotic) gene regulation. 

Let us briefly summarize our observation about the 
noise: 

(i) Not only can we make models for mean input/output 
relations 5({c^}), but we can compute the noise itself, 
as a function of the input, crg{{cpi}). Noise behavior is 
connected to the kinetic rates of molecular events, which 
are inaccessible in any equilibrium measurement of mean 
input/output behavior. Therefore, if noise is experimen- 
tally accessible, it provides a powerful complementary 
source of information about transcriptional regulation. 

(ii) There are fundamental (physical) sources of noise 
which biology cannot avoid by any "clever" choice of reg- 
ulatory apparatus; thus the precision of every regulatory 
process must be limited. These sources all fundamentally 
trace back to the finite, discrete and stochastic nature 
of molecular events. In theory, the corresponding noise 
terms thus have simple, universal forms, and we can hope 
to measure them in the experiment. 

(iii) There are sources of noise in addition to the funda- 
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FIG. 7: The behavior of noise in hunchback expression, ag, as 
a function of the mean induction level of hunchback, g £ [0,1]; 
reproduced from Ref 32 with data from Ref 35 . Data points 
(black circles) show the measurement in 9 fly embryos at nu- 
clear cycle 14; each point is an average across nuclei receiving 
the same input concentration of bicoid, error bars are std 
across embryos. Solid lines show model fits: blue dashed line 
is a two-parameter fit with input switching and output noise 
contributions; red dashed line is a two parameter fit with in- 
put diffusion and output noise contributions, assuming step- 
like regulation of bcd/hb (infinite Hill coefficient); solid black 
line is a two-parameter fit with input diffusion and output 
noise contributions, with the mean input/output relation g(c) 
inferred from the data (Hill coefficient ~ 5). The black line is 
a very good fit to the data, indicating that the diffusion in- 
put noise contribution, responsible for the peak, is dominant, 
while the output noise contribution, responsible for the noise 
magnitude at full induction where ^ = 1, is smaller. 

mental, intrinsic ones, including extrinsic, experimental, 
etc. The hallmark of a good experiment is the ability 
to separate these sources by clever experimental design 
and/or analysis; see e.g. Refs [35] [361. 



III. INTRODUCTION TO INFORMATION 
THEORY 

A. Statistical dependency 

Up to this point we have stressed the role of noise in bi- 
ological networks and mentioned several times that noise 
limits the ability of the network to transmit information; 
in this section we will turn this intuition into a mathe- 
matical statement. 

Recall that in our introduction to noise, we started 
with a probabilistic description of an information trans- 



mission system: given some input c, the system will map 
it into the output g using a probabilistic mapping, P{g\c). 
In case there were no noise, there would be no ambiguity, 
and g — g{c) would be a one-to-one function. 

Suppose that the inputs are drawn from some distri- 
bution P(c) and fed into the system which responds with 
the appropriate g. Then, pairs of input/output symbols 
are distributed jointly according to 

P{c,g)^P{g\c)P{c) (44) 

In what follows, we will be concerned with finding ways 
to measure how strongly the inputs (c) and the outputs 
(g) are dependent on each other. It will turn out that 
the general measure of interdependency will be tightly 
related to the concept of information. 

Let's suppose that our information transmission "black 
box" would be a hoax, and instead of encoding c into g in 
some fashion, the system would simply return a random 
value for g no matter the input c. Then c and g would be 
statistically independent, and P{c,g) — P{c)P{g); such 
a box could not be used to transmit any information. 
As long as this is not true, however, there will be some 
statistical relation between c and g, and we want to find a 
measure that would quantify "how much" one can know, 
in principle, about the value of c by receiving outputs 
g, given that there is some input/output relation P{g\c) 
and some distribution of input symbols P{c). 

The first quantity that comes to mind as the interde- 
pendency measure between c and g is just the covariance: 

Cov(c,.g)= J dcj dg{c-c){g-g)P{c,g); (45) 

it is not hard, however, to construct cases in which the 
covariance is 0, yet c and g are statistically dependent. 
Covariance alone (or correlation coefficient) only tells us 
about whether c and g are linearly related, but there 
are many possible nonlinear relationships that covariance 
does not detect; for example, see Fig[8j 

Moreover, we would like our dependency measure to 
be very general (free of assumptions about the form 
of the probability distribution that generated the data) 
and definable for both continuous as well as discrete 
outputs^. We will claim, following Shannon |37j, that 
there is a unique assumption-free measure of interdepen- 
dency, called the mutual information between c and g. 
First, let us build some intuition. 

B. Entropy and mutual information 

In a gene regulatory network, the concentrations of 
the input regulatory signal c and the output effector pro- 
tein g are (nonlinearly) related through some noisy in- 
put/output relation. We want to consider how much in- 
formation the input signal conveys about the level of the 
output. In general, we have an intuitive idea of informa- 
tion, which is schematized in Fig [9] In an experiment we 
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FIG. 8: Examples of two variables, drawn from three joint 
distributions. Shown are the scatterplots of example draws. 
On the left, the variables are linearly correlated, and the cor- 
relation is close to 1. In the middle, the variables are interde- 
pendent, but not in a linear sense. The correlation coefficient 
is 0, but measures of statistical dependence, such as mutual 
information, give non-zero value. Note that we are looking 
for a general measure of interdependency: if we had a model 
that assumes that x and y lie on a circle, we could fit that 
particular model or use a measure that makes the circular as- 
sumption. Instead, we would like to find a measure that de- 
tects the dependency without making any assumptions about 
the distribution from which the data has been drawn. On the 
right, the variables are statistically independent, and both 
linear correlation and mutual information give zero signal. 



could measure pairs of (c, g) values while the network per- 
forms its function, and scatterplot them as in Fig [9] The 
line represents a smooth (mean) input/output relation 
and guides our eyes. In the case of the mock measure- 
ments in Fig|9]'V, knowing the value of the output would 
tell us only a little about which value of the input gener- 
ated it (or vice versa - knowing the input constrains the 
value of the output quite poorly) . However in the case of 
the input /output relation in Fig[9|3, knowing the value 
of output would reduce our uncertainty about the input 
by a significant amount. Intuitively we would be led to 
say that in "noisy" case A there is a small amount of 
information between the input and the output, while in 
case B there is more. From this example we see that in- 
formation about g obtained by knowing c can be viewed 
as a "reduction in uncertainty" about g due to the knowl- 
edge of c. In order to formalize this notion we must first 
define uncertainty, which we do by means of the familiar 
concept of entropy. 

Physicists often learn about entropy in the micro- 
canonical ensemble, where it is simply defined as a mea- 
sure of how many states are accessible in an isolated sys- 
tem at fixed energy, pressure and particle number. In this 
case all, say M, states that the system can find itself in, 
are equally likely, therefore the probability distribution pi 
over a set of states i, such as the particle configurations, 
is uniform, pi = 1/M. The entropy just counts the num- 
ber of states, S — ksT \og2 M . The entropies in other 
ensembles, including the canonical ensemble, are then in- 
troduced via a Legendre transform. For example in the 
canonical ensemble one allows for the energy to fluctuate, 
keeping the mean energy fixed. As a result, the system 
can now find itself in many energy states, with different 
probabilities. Here we will start with directly defining 
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FIG. 9; A schematic depiction of two mock measurements 
(dots) of an output p as a function of input c. A) A case 
where measuring the output does not greatly decrease our 
uncertainty about the input. This input/output relation has 
little information. B) In this case the input/output relation 
is informative: measuring the output significantly reduces our 
uncertainty about the input. The grey line denotes a chosen 
value of the output, and the arrows mark the uncertainty in 
the input for that chosen value of the output. 

the canonical entropy: 

^=-^paog2P^, (46) 

i 

which will be a key quantity of interest. In information 
theory and computer science the canonical entropy (up 
to the choice of units) is referred to as Shannon entropy. 
The intuition behind this form of entropy is similar to 
that of the microcanonical entropy - it counts the num- 
ber of accessible states, but now all of these states need 
not be equally likely. We are trying to define a measure of 
"accesible states," but if their probabilities are unequal, 
some of the states are in fact less accessible than others. 
To correct for this we must weigh the log2 Pi contribution 
to the entropy by the the probability pi of observing that 
state, S = —^^pi log Pi. By convention used in infor- 
mation theory we chose the units where ksT = 1. The 
logarithm base 2 defines a unit called a bit, which is an 
entropy of a binary variable that has two equally acces- 
sible states. In general in the case of M equally probable 
states, we recover 

M 

= - ^ 1 /A/ log2 ( 1 /M) = log2 M [bits] . (47) 

x=l 

According to this formula, the uncertainty in the out- 
come of a fair coin toss is 1 bit, whereas the uncertainty 
of an outcome with a biased coin is necessarily less than 
1 bit, allowing the owner of such a coin to make money 
in betting games. Entropy is nothing else but a measure 
of the uncertainty of a random variable distributed ac- 
cording to a given distribution P — {pi},i — 1, . . . ,M. 
Entropy is always positive, measured in bits, and in the 
discrete case always takes a value between two limits: 
< S[P] < log2 M. The entropy (uncertainity) is zero 
when the distribution has its whole weight of 1 concen- 
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trated at a single i. The entropy (uncertainty) is maximal 
when Pi = l/M, i.e. P is a uniform distribution. 

The notion of Shannon entropy generalizes to con- 
tinuous distributions, and to functions of several vari- 
ables, such as concentrations of many types of proteins 
c — {ci, C2, Cm} in a gene regulatory network: 



dcp{c) log2p(c). 



(48) 



As in thermodynamics, the entropy cannot be mea- 
sured directly. In physics one often measures specific 
heat, which is connected to a difference of entropies, to 
gain insight about the number configurations accessible 
to the system. To illustrate this, consider a cell with con- 
centration c of proteins that fluctuates around its mean 
c and is well approximated by a Gaussian of width cFc- 



Pic) 



1 



(49) 



Following Eq (481, the entropy of P{c) is S — 
log2 ^/2nea^. First, we observe that the entropy does 
not depend on the mean c, since the number of accessi- 
ble states does not depend on where in phase space these 
states are located. Next we see that, somewhat coun- 
terintuitively, the entropy seems to depend on a choice 
of units: if the units of concentration (and therefore CTc) 
change, the value of the entropy changes as well. This is 
a reflection of the fact that c is a continuous variable and 
the (discrete) number of accessible states must depend on 
how finely we measure small differences in c; nominally, 
if c were known with arbitrary precision, the number of 
states would be infinite. However, as long as we are only 
interested in difference of entropies, or if we specify the 
measurement precision and discretize c by binning, no 
practical problems arise. [5S] As we will soon show, the 
information measure that we are pursuing is indeed a 
difference of entropies, and the issues with continuous 
distributions will not cause us any problems. 

Having discussed entropy as a measure of uncertainty, 
it is time to return to our original goal of computing 
how much our uncertainty about the output g is reduced 
by knowing the value of the input c. Let P{g\c) de- 
scribe the input/output relation in a c — (7 regulatory 
element. Then the entropy of this conditional distribu- 
tion will measure the uncertainty in g if we know c, that 
is, it will measure the "number of accessible states" in 
g consistent with the constraint that they happen when 
input c is presented: 



S[Pig\c) 



dgPig\c)log^Pig\c). 



(50) 



Note that this entropy still depends on on the input c 
(but no longer on g, which has been integrated out). 

Now suppose for the moment that we did not know 
the value of the input c. In that case the uncer- 
tainty about the value g would be directly S[P{g)] = 



— J dg P{g) \og2 P{g)- If we form a difference of the 
two entropies, we can measure how much our uncertainty 
about g has been reduced by knowing c: 



AS = S[P{g)] - S[P{9\c)] 



(51) 



We can repeatedly measure this entropy difference in dif- 
ferent input concentration regimes, and take an average 
according to the distribution P(c) with which the inputs 
are presented. The resulting quantity, central to our dis- 
cussions, is called mutual information: 



I{c-g)^ / dcP{c){S[P{g)]~S[P{g\c)])). 



(52) 



Briefly, this quantity in bits measures how much, on av- 
erage, our uncertainty in one variable (e.g. g) has been 
decreased by knowing the value of a related variable (e.g. 
c). Mutual information is a scalar number (not a func- 
tion!), and it is customary to write c and g in parenthesis 
separated by a semicolon as in Eq (52) to denote be- 



tween which two variables the mutual information has 
been computed. 

Using the defintions of the entropies and conditional 
entropies 



P{9,c)^P{g\c)P{c), 



(53) 



we can reformulate the information between the input 
and output as: 



Hc;g) ^ j dc j ^(c, ff) loga p^^^p^^^ 

dcP{c) J dgPig\c)log,^^ 

P{c\9) 
Pic) 



dg Pig) J dg P{c\g) Xog^ 



(54) 
(55) 
(56) 



From this we clearly see that information is a symmet- 
ric quantity - the information the input has about the 
output is the same as the information the output has 
about the input. Hence this measure of information is 
called mutual information. We also clearly see that if 
the joint distribution of inputs and outputs is indepen- 
dent, P(c, 5) = P{c)P{g), then I{c;g) = 0. In this case 
the entropy of the whole system would be the sum of the 
individual entropies. If the variables are not indepen- 
dent the entropy of the system is reduced by the mutual 
information: 

/(c; g) = S[Pic)] + S[P{g)] - S[P{c, g)]. (57) 

Mutual information also has other interesting properties: 

• It can be defined for continuous or discrete 
quantities. Mutual information is a functional of 
a probability distribution, and probability distribu- 
tions are very generic objects, c and g could both 
be continuous, or any one or both can be discrete. 
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• It is reparametrization invariant. Mutual in- 
formation betwen c and g is the same than mutual 
information between any one-to-one function of c, 
/(c), and any one-to-one function of g, h(g), that is 
/(c; g) = I{f{c); h[g)). In biological context, this is 
a great asset: experiments often report, e.g. inten- 
sities or log-intensities on the microarray chips or in 
FACS sorting, and there is a lot of discussion about 
how this data should be normalized, transformed 
or interpreted prior to any analysis, or how the 
cells themselves "interpret" their internal concen- 
trations of TFs. This feature of mutual information 
is important because other statistical measures of 
correlation, like correlation coefhcients, depend on 
transformations of the data. Mutual information, 
in contrast, is invariant to such reparametrizations 
of the variables. 

• It obeys data processing inequality. Suppose 
that g depends on c and k depends on g (but not 
directly on c), in some probabilistic fashion. In 
other words, one can imagine that there is a Markov 
process, c — ?> 5 — >■ /c, where arrows denote a noisy 
mapping from one value to the next one: c gives 
rise to g and g to k. Then /(c; k) < I{c; g), that is, 
information necessarily either gets lost or stays the 
same at each noisy step in the transmission process, 
but it is never "spontaneously" created. 

• It has a clear interpretation. If there is / bits 
of mutual information between input c and out- 
put g, this can be interpreted as there being 2^^'^'^^ 
distinguishable levels of g that can be reached by 
dialing the value of input, c. By "distinguishable" 
we mean distinguishable given the intrinsic noise in 
the channel c — 5. 

There is a number of powerful theorems relating to mu- 
tual information which we will not go into here, but the 
interested reader is referred to the classic text of Thomas 
and Cover for details '38]. 

Let us consider an instructive example of a Gaussian 
channel. We assume the input /output relation between c 
and g is linear (or nonlinear, but can be linearized around 
the operating point): 



to be Gaussian as well. Using Eq ( 52 ) we find the mutual 



g = c + T], 



(58) 



while the noise in this c —> g process is additive and 
drawn from a Gaussian distribution: 



P(7?) - P{g\c) 



1 



■ exp 



(g " 
2a2 



(59) 



note that in this simple example the variance is not a 
function of c [as in our models of gene expression, e.g. 
in Eq (43)]. Let us assume that the input c itself is 
also a Gaussian distributed random variable, given by the 
distribution in Eq (49). Having fixed the distributions of 



information between the input and output to be: 



1 



He; g) = 7^ log; 



(60) 



where jcP' is the ratio of the signal variance to the noise 
variance, often referred to as the signal-to-noise ratio or 
SNR. 

If, as in our example, the noise is Gaussian and addi- 
tive, then one can show that the information transmission 
is maximized at fixed input variance when input is drawn 
from a Gaussian distribution, as we assumed above [38]. 
This is related to the fact that Gaussian distribution is a 
distribution that maximizes the entropy for a fixed vari- 
ance, and that information is maximized, according to 



Eq (52 1, when the output (or input) entropies are maxi- 
mized. Let us show that the Gaussian distribution really 
maximizes the entropy subject to a variance constraint. 
We formulate the problem as a constrained optimization 
procedure, for the input distribution P{c): 

C[P{c)] ^- J dc P{c) log P (c) - Xa J dcP{c) 

- Xi j dccP{c)-\2 j dcc^P{c). (61) 

Here, the Lagrange multiplier Aq will enforce that the dis- 
tribution is normalized, Ai can be used to fix the mean, 
and A2 to constrain the variance. Optimizing with re- 
spect to P(c), 5C/5P{c) = 0, we obtain: 



logF(c) = -l-Ao-Aic-A2c2 



(62) 



We can complete the square in c and express P{c) to 
obtain: 



P(c) = exp -A2 



A. 
2A, 



n 2N 



(63) 



the noise and the input, this uniquely defines the output 



where Z — exp(— 1 — Aq — A2/4A2). By making the iden- 
tifications c = — A1/2A2 and ai = I/2A2, we see that we 
can select Lagrange multipliers such that the result of 
the entropy maximizing optimization is a Gaussian dis- 
tribution with the desired mean and variance. These ar- 
guments together show that for a channel with Gaussian 
additive noise with fixed variance, mutual information is 
maximized when the input and output variables are cho- 
sen from a Gaussian distribution. The Gaussian channel 
result of Eq ( |60| gives an upper bound on the amount 
of information that can be transmitted under these as- 
sumptions. We will return to Gaussian channels when 
considering time-dependent solutions in Section [IVD[ 

Finally, we present the last example, originally studied 
by Laughlin in the context of neural coding of contrast in 
fiy vision [39l |40j , to build intuition about maximal in- 
formation solutions. Consider a nonlinear system that 
translates an input c to an output via a mean in- 
put/output relation g = g{c). In Laughlin's case, the 
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input was the contrast incident on the fly's eye, while 
the output was the firing rate of a specific neuron in the 
fly visual system; the input /output relation in this case 
was experimentally measurable quantity. The system is 
stochastic and so we really measure g, which is a random 
variable whose mean is given by g{c). Let us assume the 
noise is additive and constant - it does not depend on the 
value of c [this assumption is the main difference between 
this problem in fly vision and the case of gene regulation 
which we study below, where both mean input/output re- 
lation and the noise are functions of c] . We can then ask, 
as Laughlin did, what distribution of inputs, -P(c), will 
maximize information transmission through this channel, 
by writing down an optimization problem for the infor- 
mation of Eq (52), while constraining the normalization 
of P(c): 



1 



5P{c) 



S[P{g)] - / dc P{c)S[P{g\c)] + X dc P{c) 



= 0, 
'(64) 

where we are considering P(g\c) as given from the exper- 
iment and fixed, and P{g) = J dc P{c)P{g\c). If noise is 
independent of c, then the conditional entropy is also a 
constant, S[P{g\c)] = a. Optimizing C we obtain: 



(-« + A) + / dg ^-§^Pi9\c) = 



The second term gives: 



dg P{9\c) 



SS[Pig)] 



5P{9) 

dgP{g\c){-\ogP{g)-l) 
\ogP{-g)-l, 



(65) 



(66) 

(67) 
(68) 



where in the last line we have assumed P{g\c) is strongly 
peaked around the mean, g{c) (this enables us to ap- 
proximate the average over log P{g) with the log of the 
distribution of average values). Apart from \og P{g) all 
terms in Eq (65) are constant, hence we have derived the 



result the information-maximizing distribution of mean 
outputs is a constant as well: 



P{g) — const. 



Since P{c)dc — P{g)dg, we find using Eq (69) that 



P{c) 



dgjc) 

dc 



(69) 



(70) 



The optimal way to encode inputs, given a known in- 
put/output relation g{c) and constant noise, is such that 
all responses g are used with the same frequency. In 
Laughlin's case, this result made a prediction: if the fly 
visual system is adapted to the distribution of contrast 
levels in the environment, then by measuring g{c) one 
could predict the distribution of contrast levels in nature 



according to Eq ( 70 ) . This prediction can be checked by 



going outdoors and collecting the natural contrast dis- 
tributions directly using a properly calibrated camera. 
The results matched the predictions beautifully, illustrat- 
ing that the fly visual neuron is using its finite dynamic 
range of firing rates optimally. In engineering, this encod- 
ing technique that takes an arbitrary input distribution 
P(c) and transforms it into a uniform output distribution 
P{g), is known as histogram equalization. 

We emphasize again that this result is only true if the 
noise is constant, and if the distribution P{g\c) is tightly 
peaked around the mean value, g{c). If these assumptions 
do not hold, but the range of inputs is constrained, the 
optimal input distributions may be discrete (a sum of 
delta functions) [H]. In Section IV A we shall see how 
the optimal input and output distributions change when 
the noise depends explicitly on the input c. 

In this review we are considering how information is 
transmitted between the input of a gene regulatory net- 
work and its output. Apart from mutual information 
that we are using, there exist other measures of informa- 
tion, which ask slightly different questions. For example. 
Fisher information tells us how well one can estimate (in 
a L2-norm sense) the value of an unknown parameter 9 
that determines the probability distribution from which 
measurements are drawn. However, just as Fisher in- 
formation makes assumptions about the "error metric" 
(L2 norm), so do other measures make alternative as- 
sumptions either about the distributions from which the 
data are drawn or about the error metric. Shannon has 
shown that mutual information alone provides a unique, 
assumption-free measure of dependency for any choice of 
P{c,g) 137]. 



C. Information transmission as a measure of 
network function 

In previous sections we laid down the mathematical 
foundations for describing gene regulation and intro- 
duced the concept of information transmission between 
inputs and outputs of noisy channels. Before bringing the 
two topics together and showing how information trans- 
mission applies to genetic regulatory networks, we should 
ask ourselves why information transmission might even 
be a feasible measure of network function. In this section 
we briefiy review some experimental justifications for this 
approach. 

The main criticism against any given (mathematically 
definable or tractable) measure of function, including in- 
formation, is the lack of arguments why this particular 
measure should be singled out from other candidate mea- 
sures. In words it is clear that "selection is acting on the 
function," but the biological function in this context is 
often thought to be some arbitrarily complicated math- 
ematical function that could weight many aspects of the 
network together in some uncomprehensible way. Pre- 
cisely for this reason we use mutual information: regard- 
less of what exactly the biological function is and how 
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the network processes the inputs, according to Shannon, 
there has to be some minimal amount of transmitted in- 
formation to support this biological function. 

A stronger criticism states that he examples of net- 
works in extant species observable today are not yet op- 
timized for biological function, whatever that might be. 
If they are not even close to the extremal point and the 
space of the networks is large, then the observable net- 
works today could be viewed purely as results of their 
ancestry, as random draws from a huge space of possible 
networks that perform the biological function just "well 
enough" for the organisms to survive. This is certainly 
a valid criticism, but it is hard to see what one can do 
about it a priori. However, if it turns out that the net- 
works observed today are at (or close to) the extremum 
of some measure of function that we postulate, such as- 
sumptions might be validated a posteriori. While valid, 
this criticism should therefore not prevent us from trying 
to find relevant network design principles. 

On the other hand, networks do have to obey physical 
laws and constraints, such as the limitations in accuracy 
of any network function due to stochasticity in gene reg- 
ulation. It is therefore interesting to explore how these 
limits translate into observable circuit properties. There 
could be other constraints shaping the network structure 
apart from noise: the metabolic cost to the number of sig- 
naling molecules used by the network, or the constraint 
on the speed of signaling etc. We decided to concentrate 
on the noise constraint (which is indeed related to the 
constraint on the number of signaling molecules, as we 
will show later) because it has physical basis relevant for 
all networks, and because it can be measured in today's 
experiments. 

Taken together, we realize that not all (if any) gene 
regulatory networks are solely optimized to transmit in- 
formation. However, as we have argued above, mutual 
information is in some sense a minimal measure: any 
network that performs whichever biological function well 
will have to keep noise in check, and better performance 
of that function will imply smaller noise and thus larger 
values of transmitted information. In this sense our ap- 
proach can fail if constraints other than noise are domi- 
nant: then information will fail to discriminate between 
good networks (in information sense) that nevertheless 
differ strongly in terms of these remaining constraints. 

As we show below, the principle of maximizing infor- 
mation in genetic networks is predictive about network 
structure. Therefore, theoretical results can be compared 
to experiments, which in turn can give us insight into 
other principles and constraints at play in nature. In 
the long run we are thus hoping for a productive interac- 
tion between theory and experiment that systematically 
reveals various determinants of genetic regulatory net- 
works. 

One of the systems in which ideas about informa- 
tion transmission in genetic regulatory networks could 
be tested has been early embryonic development of 
Drosophila. This genetic organism is a prime example 



of spatial patterning, where nuclei in the early embryo, 
though they all share the same DNA, initiate different 
programs of gene expression based on a small number of 
maternal chemical cues. These precise and reproducible 
spatial domains of differential gene expression in the em- 
bryo that later lead to patches of cells with distinct devel- 
opmental fates have been extensively studied, as has been 
the nature of the maternal cues, called maternal mor- 
phogens. In genetics and molecular biology researchers 
have thus introduced already the concept "positional in- 
formation" encoded in the maternal morphogens, which 
is read out by the developmental regulatory network, but 
this concept has not been defined mathematically. In the 
following paragraphs we will very briefly outline the bi- 
ology of early Drosophila development, review the rele- 
vant measurements, and proceed to connect them to the 
framework we built in the preceding sections. 

When a Drosophila egg is produced by the mother, the 
mother deposits mRNA of a gene called Bicoid in the an- 
terior portion of the egg. These mRNAs are translated 
into into bicoid protein, which diffuses towards the pos- 
terior, establishing a decaying anterior-posterior protein 
gradient (see Fig 10). The maternal morphogen bicoid 
acts as a transcription factor for four downstream genes, 
known as "gap genes" (Hunchback, Kriippel, Knirps and 
Giant). Looking along the long axis of the ellipsoidal 
egg, known as the AP (anterior-posterior) axis, one can 
see about 100 rows of nuclei at cell cycle 14, about 2 
hours after egg deposition, when the nuclei still uniformly 
tile the surface of the egg and before large morphologi- 
cal rearrangements, called gastrulation, start to occur. 
These nuclei express proteins (mostly transcription fac- 
tors) that will confer cell fate: nuclei belonging to various 
spatial domains of the embryo express specific combina- 
tions of genes that will lead these nuclei to become pre- 
cursors of different tissues. Stainings for relevant tran- 
scription factors have shown a remarkable degree of preci- 
sion with which the spatial domain boundaries are drawn 
in each single embryo, and a stunning reproducibility in 
positioning of these domains between embryos. Although 
probably a slight overgeneralization, we can say that at 
the end of cell cycle 14, along the AP axis, each row of 
nuclei reliably and reproducibly expresses a gene expres- 
sion pattern that is characteristic of that row only - in 
other words, the nuclei have unique identities encoded 
by expression levels of developmental TPs along the long 
axis of the embryo. 

The spatial gradients form a chemical coordinate sys- 
tem: it is thought that each nucleus can read off the lo- 
cal concentration of bicoid (and other morphogens), and 
based on these inputs, drive the expression of the second 
layer of developmental genes (the gap genes, which we 
denote by gi); these in turn lead to ever more refined 
spatial patterns of gene expression that ultimately gen- 
erate the cell fate specification precise to a single-nuclear 
row. Por a recent review of the gap gene network, please 
see Ref [12]. 

We can make a simple back-of-the-envelope calcula- 
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100 rows of nuclei along the AP axis 



tion: If there are 100 distinguishable states of gene ex- 
pression along the AP axis responsible for 100 distinct 
rows of nuclei, some mechanism must have delivered 
/ « log2(100) « 7 bits of information to the nuclei. 
That's the minimum amount of information needed to 
make a decision about the cell fate along the AP axis. 
Intuitively, this number is the same as the minimum of 
how many successive binary ( "yes or no" ) questions are 
needed to uniquely identify one item out of 100: the best 
strategy is to ask such that each question halves the num- 
ber of options remaining. Each answer to the question 
would thus convey 1 bit of information, and reduce the 
initial uncertainty of 7 bits by 1 bit. Similar pattern- 
ing mechanisms also act along the other axes of the em- 
bryo, and if each of the 6000 nuclei at cell cycle 14 were 
uniquely determined, these systems together would have 
to deliver about 13 bits of information. 

Let us start by considering the regulation of Hunch- 
back by bicoid. By simultaneously observing the concen- 
trations of bicoid (c) and hunchback {g) across the nu- 
clei of an embryo, one can sample the joint distribution 
P{c,g), see Fig 10 Usually it was assumed that hunch- 
back provides a sharp, step-like response to its input, bi- 
coid; mathematically, this would mean that the bcd/hb 
input/output relation is switch-like, with an "on" and an 
"off" state, yielding information transmission capacities 
of about 1 bit. However, is this really the case? 



Using the methods from Section [HI B combined with 
the direct experimental measurements of probability dis- 
tributions of Gregor and collaborators f3S|, one can find 
how much information bicoid c and hunchback g carry 
about each other. The result is /oxpt(c;g) — 1.5 ± 0.1 
bits, where the error bar is computed across 9 embryos. 
This is an experimentally determined quantity, and the 
errors [apart from the estimation bias [13]] are related 
mostly to our ability to fairly sample the distribution 
P(c, g) across the ensemble of nuclei. Our sampling is not 
complete because a single microscope view only records 
about a quarter of all nuclei, but we believe that that 
sampling is not very biased. Another point to have in 
mind is that the computation of /(c; g) reflects all sta- 
tistical dependency in the probabilistic relation c ^ g: 
both the direct regulation, as well as any possible in- 
direct regulation through an unknown intermediary x, 
e.g. c — > X — > g. Thus, for example, if bicoid activates 
hunchback which self-activates itself, our information es- 
timation has taken this into account. If, however, g is 
regulated also by an input y independent of c, that is 
{c, y} — g, and our experiment does not record y, then 
we might be assigning some variability (or noise) to g, al- 
though that noise really would be a systematic regulatory 
effect caused by y. In this last case, we would measure a 
smaller value of /(c; g) and would underestimate the real 
precision in the system; the true value would only be 
revealed upon recording the unobserved regulator y and 
computing /({c, y}; g). This might be the case for bi- 
coid regulating Hunchback, since we know that (i) some 
hunchback is also maternally deposited (not all hunch- 
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FIG. 10: Drosophila melanogaster embryo at cell cycle 14. 
Nuclei stained in blue (see inset), bicoid stained in green and 
hunchback stained in red, data reproduced from Ref [35]. At 
this stage, about 6000 nuclei are present in the embryo, of 
which about a quarter are visible under a single microscope 
view. Each nucleus provides a joint quantitative readout 
proportional to bicoid and hunchback intensities; the data 
is shown in scatter plot below. Usually hunchback was un- 
derstood as having a single precise boundary that separates 
the domain of high expression ( "on" ) from the domain of low 
expression ("off"). We use information theory to make this 
statement precise and to find out if the bicoid/hunchback reg- 
ulatory element really can be understood just as a binary 
switch. 



back is made under control of bicoid); (ii) nanos, another 
maternally supplied mRNA, establishes a separate pro- 
tein gradient extending inwards from the posterior, and 
inhibits the translation of Hunchback; (iii) there might 
be weaker influences from other morphogens and termi- 
nal patterning factors. 

Having these caveats in mind, our first finding is that 
the information transmission of 1.5 bits between bicoid 
and hunchback that we measure from the data is larger 
than 1 bit, which would be needed if bicoid/hunchback 
transformation were a simple binary switch. To our 
knowledge this was one of the first times that a quantita- 
tive measure of "regulatory power" was computed for a 
genetic regulatory element that was measured in a high- 
precision experiment. 

While the result that 1.5 bits estimated from the data 
is larger than 1 bit needed for a binary switch is intrigu- 
ing, it would be instructive to have another measure to 
compare 1.5 bits to. To this end, we will put an upper 
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FIG. 12: The real (measured) information transmission and 
the maximal information transmission (channel capacity) in 
the bicoid/hunchback regulatory system. The input/output 
relation P{g\c) = P(Hb|Bcd) is measured and held fixed. To 
estimate the true information transmission of 1.5 bits, the ex- 
perimentally sampled Ptf (Bed) is used to construct the joint 
P{c,g). To find the channel capacity, PTi^'(Bcd) is varied 
until the information-maximizing choice is found numerically, 
denoted as P*(Bcd); this yields 1.7 bits of capacity. The opti- 
mal choice for the input distribution also predicts the optimal 
distribution of outputs, shown in Fig. |13[ 



FIG. 11: The noise in the regulation of hunchback is approx- 
imately Gaussian. Joint nuclear measurements of bicoid and 
hunchback are performed across ~13k nuclei in 9 Drosophila 
embryos at nuclear cycle 14; data from Ref [35]. Nuclei 
are sorted in 100 bins according to their bicoid concentra- 
tion; for each bin, we compute the mean hunchback response, 
(Hb(Bcd)), and the noise in the response CTHb(Bcd). For each 
nucleus we take its input bicoid concentration, find the mean 
response and noise for that bicoid level and define its z score 
as the deviation from the mean, normalized to the noise. The 
plot shows a distribution of the z scores across all nuclei (in 
red) and compares it to the case where the noise would be 
perfectly Gaussian (black) with zero mean and unit standard 
deviation. The agreement is reasonable, with real data being 
somewhat more skewed. 



bound of how much information could have maximally 
been transferred between bicoid and hunchback, given 
the measured level of noise in the system. To do this, let 
us start by writing: 



Pic,g)=Pig\c)PTF{c). 



(71) 



As shown in Section|TTj the term P{g\c) describes the in- 
put/output properties of the regulatory element. From 
experiment, we can determine the mean response g{c) 
of the regulatory element and the noise in the response, 
(c). In Figrm we show that the noise found directly 
from the measurements, p{g\c), is to a good approxima- 
tion Gaussian Q. Therefore these two measurements, g{c) 
and cr^(c), determine P{g\c) to a good approximation. 

To ask about the maximum achievable information 
transmission given the measured input/output relation 
P{g\c) Q{g\g{c),ag{c)), we proceed in a manner simi- 
lar to that used by Laughlin in his studies of fly vision. 



We write the Lagrangian 



C[Ptf{c)] = I{c-g) ~A dc Ptf{c) 



(72) 



where A is a Lagrange multiplier that will enforce the 
normalization of Ptf{c), while 



/(c;.9) = 



j dc Ptf{c) J dgP 



'(5|c) log2 



(73) 



is the mutual information, and Pig) = 
J dc PTF{c)P{g\c). We can now look for the opti- 
mal distribution of inputs, Ptf{c), which must satisfy: 



SCjPTFjc)] 

SPtf{c) 



= 0. 



(74) 



One way to solve this variational problem is numerically. 
For details see Refs [S] EH |35] ; here we only report on 
the results. 

We find that holding P{g\c) fixed as determined from 
the data on bicoid/hunchback relationship, and optimiz- 
ing Ptf{c) numerically, yielded the maximal channel ca- 
pacity of I*{c;g) = 1.7 bits, see Fig. 12 Additionally 
the optimal Ptf^'^) predicts the optimal distribution of 
hunchback expression levels observed across the ensem- 
ble of nuclei, through P* (g) = J dc P{g\c)P^p{c), and 
the optimally predicted distribution matches the mea- 
sured distribution very well [Fig. [13]. The value found 
for the maximal information transmission (channel ca- 
pacity) shows that the real biological system is operat- 
ing close to what is achievable given the noise, that is 
IcKpt{c; g) / 1* {c; g) « 90%. The high value is somewhat 
unexpected given that we know that hunchback is regu- 
lated also by other inputs, and that bicoid also regulates 
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specification. 

In Section |III B| we briefly described the Gaussian 
channel approximation, where in addition to Gaussian 
additive noise one assumes that the input distribution 
Ptf{c) is well-approximated by a Gaussian, and the in- 
put/output relation is linear. Clearly, this is not the case 



FIG. 13: The measured (black) and predicted optimal (red) 
distribution P{g) of hunchback expression levels across an en- 
semble of nuclei in the Drosophila embryo. The expression 
level g goes from (no induction, posterior) to 1 (full in- 
duction, anterior). A considerable fraction (~ 30%) of nuclei 
express intermediate levels of hunchback, and the noise in the 
system is low enough that this intermediate expression level 
could constitute a separate signaling level from and 1; this 
would be consistent with the observed information of 1.5 bits 
that intuitively corresponds to 2^'^ ~ 3 distinguishable lev- 
els of gene expression. The inset shows the same plot on the 
logarithmic scale. 



other targets. Nevertheless this finding is a good mo- 
tivation to consider taking maximization of information 
transmission seriously as a possible design principle. 

How should we understand the values in the range of 
/ ~ 1.5 — 1.7 bits? It turns out that the bicoid gradient 
is read out directly by 4 gap genes: hunchback, krup- 
pel, giant and knirps. If each would independently be 
able to encode ~ 1.5 bits, then together this genes could 
convey I{c;{gi]) ~ 6 — 7 bits of information about bi- 
coid and would thus achieve the amount needed for AP 
patterning. In this case, we would be able to claim con- 
sistency with the back-of-the-envelope calculation that 
requires at least this amount of information for the AP 
specification. Before reaching such a conclusion, how- 
ever, we need to resolve the following issues: (i) The 
readout (gap) genes {gi\ are probably not independent, 
but have some redundancy, which will mean that they 
convey less than the sum of their individual information 
values about c; such redundancy, as we find below, can 
be alleviated by proper network wiring; (ii) The next 
layer of developmental cascade after the gap genes is not 
regulated solely through the gap genes, but receives in- 
puts from maternal morphogens directly; therefore, the 
gap genes are not a single bottleneck through which the 
information can flow; (iii) especially at the poles of the 
embryo, gradients other than bicoid provide spatial infor- 
mation about the AP position; (iv) our formulation of 
the problem assumes steady state gap gene readout from 
a stable gradient; it is not clear that such steady state 
is really reached in the timeframe necessary for nuclear 



at hand: the input/output relation is nonlinear [Fig 10 



the resulting distributions of hunchback are strongly bi- 
modal [Fig IIs] and the input distribution of Bed is also 
not Gaussian [not shown]. 

In Ref [in] Emberly showed that there is an optimal 
morphogen decay length that minimizes the amount of 
input proteins that need to be produced, while allowing 
the target output gene to reach the desired precision. The 
predictions applied to the bcd/hb system showed that 
the predicted decay length scale is consistent with the 
properties of the experimentally observed bed gradient. 
Interestingly, Emberly showed that the optimal input bed 
gradient also achieves a near maximal transmission of 
information, making it consistent with the predictions 
summarized above [S]. 

Further experiments and theory will be needed to suc- 
cessfully address outstanding issues and to check whether 
the near-optimality in information transmission is main- 
tained as larger portions of the network are recorded 
experimentally. We hope that the discussion neverthe- 
less provides enough motivation for looking at quantities 
like I{x\ c) - the information that the morphogen gradi- 
ent encodes about the physical location x] at /(c;{(7i}), 
and at I(x]{gi\) - the information that later develop- 
mental genes (like gap genes) carry about the physical 
location. Information processing inequalities also con- 
strain the relationships between these (directly measur- 
able) quantities, providing an implicit check of whether 
we have missed some unobserved regulatory pathway. 
Before proceeding, we note that experiments that probe 
these quantities are not easy, because they require us to 
measure simultaneously the expression levels of a num- 
ber of genes, nucleus by nucleus, in order to estimate 
both the mean response, gi{c) — {gi{c)), as well as the 
noise covariance in the responses, Cij{c) — {5gi{c)5gj{c)) , 
where 5gi = gi — gi{c) and brackets denote averaging with 
respect to the ensemble of nuclei. 



IV. INFORMATION TRANSMISSION IN 
REGULATORY NETWORKS 

A. Small noise approximation 

Having seen that in at least one biological system the 
information transmission approach the channel capac- 
ity (maximum achievable transmission given noise), we 
would like to elevate this finding to a principle: let us 
find network wiring diagrams and interaction parame- 
ters that transmit the most information from input TFs 
to the regulated output genes. Before we start we should 
note that this is a very ambitious goal: we are trying 
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to derive (not fit!) the structure of a genetic regulatory 
network. With all the approximations and simplifica- 
tions that need to be made (also in the absence of exper- 
imentally measured parameters like protein decay times, 
diffusion constants etc) our standard for success will be 
if we will have managed to qualitatively reproduce gap 
gene expression patterns observable in the fly. 

Analytically, the problem of finding the maximum in- 
formation transmission [Eq (74)] is tractable in the so- 
called small- noise limit, where across most of the input 
range the noise over the mean is small, ag{c)/g{c) 1. 
This is the limit which we present and use in the follow- 
ing section to explore the optimal architecture of small 
regulatory networks. 

We will consider networks where a single transcription 
factor at concentration c can regulate a set of K tar- 
get genes {gi}, i = 1, . . . , -ftT, which may be interacting in 
a feed-forward network. For now, we will not consider 
feedback loops that can cause multistable behavior. It is 
clear that without any constraint, the information trans- 
mission can trivially be increased by decreasing the noise, 
and in biochemical networks noise can be decreased arbi- 
trarily by increasing the number of signaling molecules, 
both on the input side (c) and on the output side {{gi})- 
The crucial idea is therefore to optimize information sub- 
ject to biophysical constraints, i.e. subject to using a fixed 
number of signaling molecules. 

With these assumptions in mind, we sketch the deriva- 
tion of information transmission in the following text; for 
details see Refs |311 S?) 48 . For additional work on in- 
formation transmission in biochemical networks see Refs 

The dynamics of gene expression for genes {gi} is given 
by a generalization of Eq Q which we used for the case 
of a single gene: 



_dgi 
dt 



(75) 



where r is the protein lifetime, £,i is the Langevin noise 
force with {^i{t)^^it')) = S{t - t')Nij = 5{t - t')S,jN,. 
Before proceeding we note that in physical units the input 
c goes between and Cmax, but when we write down the 
noise strength (again a sum of input and output noise 
contributions as in the case of a single gene) , we note that 
this problem has a "natural" concentration unit, cq ~ 
Nmax/ Dot, i.e. the maximum number of independent 
molecules of the output iVmaxi divided by the relevant 
diffusion constant, typical size of the binding site a and 
the integration protein lifetime r. This is simply the scale 
of output noise divided by the scale of the input noise. 
With this unit in hand we can make the concentrations 
dimensionless, so that c G [0, C], where C = Cmax/co, and 
all gi G [0, 1] as before. 

For completeness, we provide the expression for the 
noise magnitude Ni, which is a generalization of the term 



explained in Section II E 



N,. = 



f 9f^{c■, {gi}) 



dc 



E 



9k 



dftjc; {gi}) 
dc 



\{gk=gk(c)}^ 



(76) 



the first term again corresponds to the output noise, and 
second and third terms in the parenthesis correspond to 
the diffusion noise (due to the diffusion of c and of other 
TFs {gi}, respectively). 



In Eq (75), f{c,{gj}) G [0,1] is the regulatory (in- 



put/output) function, describing the activation rate of 
gene gi, given the input c and the expression levels of 
all the other genes. Various regulation functions were 
discussed in Section |llj for combinatorial regulation, the 
most flexible one that we have examined was the Monod- 
Wyman-Changeaux (MWC) regulation function: 



l + e^.(c,{s,})' 
~nllog{l + c/Kl)- 
^njlog(l+5,/iq) 



(77) 



In this model, the regulation of gi is jointly affected by 
the input c and the level of other gap genes, {gj}, which 
is reflected by the various contributions to Fi : every regu- 
latory input to gi contributes a term to the "free energy" 
F, and each such term is parametrized by nj , the num- 
ber of binding site for gj in the promoter of gi , and , 
related to the energy of binding to that binding site; as 
before, L is the free energy offset between the "on" and 
"off" states when no transcription factor is bound. If we 
want to avoid feedback and multistability, we can always 
renumber the genes such that each gene gi only depends 
on the input c and other genes gj where j < i. 

The regulation in a network of a single input c and K 
target genes gi is then described by unknown constants 
{L\ Kj^n^j^nl, K^}. When — ^ 0, the regulation of 
gene j by gene i is absent, that is, in the wiring diagram 
the arrow from gj to gi disappears. 

Before proceeding, we need also to compute the noise 
in this regulatory network. The noise in gi is given by two 
contributions: the output noise from generating a finite 
number of proteins of gi, and the input diffusive noise 
because gi is regulated by c and other gj. The noise in 
our setup with K target genes is fully determined by a 
K X K covariance matrix: 



Ctjic) ((g., - gi{c)){gj - gj{c))). 



(78) 



which can be computed from Eqs (77), as shown in 
Refs [44l |48]. Here we briefly outline how to do this. 
By linearizing the dynamical equations in Fourier space 
for the output concentrations in Eqs (75), one obtains a 



matrix equation of the form (tildes denote Fourier trans- 
forms): 



(79) 



In a manner completely analogous to Eq ( 36 ) but gen- 



eralized to K output genes, we can then compute the 
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elements of the covariance matrix in Fourier space as: 



duj 
2t: 



(80) 



where the noise magnitudes Ni are given by Eq ( 76 ) . 

In addition to computing this matrix, we find that 
there is a single dimensionless parameter C in our prob- 
lem, describing the dynamic range of the input, c G [0, C]. 
This parameter will control the shape of the optimal 
solutions [84]. C is the maximal concentration for the 
input c, expressed in "natural units of concentration," 
which describes the balance between the input and out- 
put noise strengths. Large values for C mean that the 
output noise is dominant over the input noise, while a 
small dynamic range and therefore small C means that 
the input diffusive noise in c is the dominant noise in 
the system. Alternatively, changing C reflects how many 
input molecules are at the disposal for communication - 
larger values of C are more "costly" in metabolic terms, 
but allow more information to be transmitted. 

With the noise covariance matrix in hand, the distribu- 
tion of outputs given the input c is a multivariate Gaus- 
sian: 

PiiaAlc) - ^ . (81) 



(27r)^/2^q 

Suppose that we now ask the opposite question: having 
seen the values of gap genes {gi}, what is the most likely 
value of c that produced them, and what is the variance 
in c? If the noise is small, P{c\{gj}) will also be Gaussian, 



which can be found from Eq ( 81 ) and the Bayes' theorem 



Pic\{9A) 



1 (<=-<=• ({gj}))^ 

'2 J, 



(82) 



where c*{{gj}) is the most likely value for c that gives 
rise to the observed {gj}, and 



1 



E 



dgi^-idgj ^ 
dc dc ' 



(83) 
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FIG. 14: The optimal input/output relations for repressors 
(blue line, dashed) and activators (red line, solid) for one 
gene g regulated by one input c with no feedback. 



that is, the system should optimally use those input levels 
c more frequently that have proportionately smaller ef- 
fective noise. Using this optimal choice the information, 
in bits, will be: 



{gj}) = log2 



(86) 



where Z ~ ^ dc a^^ (c) is the normalization of the distri- 
bution in Eq ( |85| . 

This is as far as we can push analytically; 
I{c]{gj}) still depends through Z on the parameters 
{L*, JiTj, n^-, rt* , if^} that determine the wiring diagram 
of the network and the strengths of the regulatory ar- 
rows. The last remaining task is, therefore, to numeri- 
cally optimize Eq (86) with respect to these parameters. 



and examine the structure of optimal solutions. 



Uc is the effective noise level in the input that accounts 
for all the noise in the system[55]; this effective noise 
is computable from the noise covariance matrix and the 
mean input /output relations. 

Following Eq (52), the information between the input 
and the outputs /(c; {gj}) is 

I{c-{g,}) = S[PTF{c)]-{S[P{c\{g,))])PT.ic),m 

where the distribution of inputs, Ptf{c) is unknown. We 
want to find the maximal information transmission given 
the known noise, therefore, we look for the maximum 
of / with respect to Ptf{c), just as we did in Eq (74), 
while insisting that Ptf{c) be normalized. Following the 
derivation in Refs [HI HT] we find that 



P^f{c) 



1 1 
Z o-c(c) ' 



(85) 



B. Optimal network architectures 

We can finally ask what are the optimal input / output 
relations for K genes {gi\, regulated by the single input 
c, if we do or do not allow for mutual interactions be- 
tween the outputs. These results are a function of C, the 
dynamic range of the input, which is the single parameter 
of our optimization problem. 

Let us start with considering the simple case of one 
input, c, regulating one output g '47'. In Fig 14 we 
plot the two optimal regulation functions for an acti- 
vated and repressed gene. These results correspond to 
two well-defined optima in /(c; g) as a function of the 
two parameters defining the input/output function, the 
cooperativity h and the dissociation constant Kj^. These 
optimal solutions result from the balance between the two 



23 



(input and output) components of the noise that hmit 
the information transmission at different values of c: the 
solutions are a compromise between avoiding readouts 
at low input concentrations, where input noise is largest 
(pushing Kd and h to higher values), and being able to 
distinguish different levels of outputs reliably (pushing 
h and K4 lower). Because the form of the noise is dif- 
ferent for repressed and activated genes, the two opti- 
mal input/output relations are not mirror images of each 
other. However, the capacities of an activated and re- 
pressed gene are comparable, with the slight advantage 
of activated genes over repressed genes increasing as the 
resources become scarcer (for smaller C). 

Figure [15] shows the example solutions for K = 5 non- 
interacting genes as a function of C. We see that there 
are two regimes: at low C, the optimal solutions for all 
5 genes have exactly the same parameters, and therefore 
their input/output curves overlap perfectly. Why is this 
behavior optimal, if at first glance all the genes appear 
completely redundant? At low C, the input noise is dom- 
inant, and the best strategy is to have all i^T = 5 genes 
read out the input c and lower the input noise by aver- 
aging: using K readouts should lower the effective noise 
by a factor of 

At high C another strategy, called the tiling solution, 
becomes optimal: here, each gene gi changes its expres- 
sion considerably over some limited range of inputs, and 
various genes gi encode various non- overlapping input 
ranges; in other words, each gi "reports" on its own range 
of inputs, while the other gj have either not switched on 
yet, or are already saturated. We can explore the transi- 
tion from redundant to tiling solutions in detail, and we 
can carefully study the scaling of information capacity 
I{c;{gi}) with the number of genes K in each solution 

El!- 

Although interesting from a theoretical perspective, 
the redundant and tiling solutions are not what is actu- 
ally observed in the real gap-gene network of Drosophila. 
In particular, when {gi} are independent, the only pos- 
sible input /output relations are sigmoid; there are no 
stripe-forming solutions, where gi would turn on at some 
concentration c and turn off at some higher concentra- 
tion. Can such solutions emerge if the activating and 
repressing interactions between the output genes are al- 
lowed? 

Indeed we find that this is the case, as shown in 
Fig [16] If the interactions between two output genes 
{51(c), 52(c)} are allowed (and optimized over), the in- 
formation maximizing wiring diagram includes "lateral 
repression" between the two genes that are jointly acti- 
vated by a common input. This also generates effective 
input/output curves that are non-monotonic in c: (72 as 
a function of c is seen to exhibit a stripe of activation. 
Further work has confirmed that such stripe-like patterns 
optimize information transmission |48j . Interestingly, a 
similar pattern of interconnections ( "lateral inhibition" ) 
is known to occur in neural networks involved in the reti- 
nal processing of visual stimuli, and is thought to serve 
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FIG. 15: The optimal input/output relations for if = 5 genes, 
{gi(c), (75(c)} (shown in various colors), regulated inde- 
pendently by a common input, c. The first 5 panels show 
optimal solutions depending on the dynamic range of the in- 
put, C, that is, when c G [0, C]. As C is increased, the totally 
redundant solution, where gi(c) = ... = 95 (c), slowly be- 
comes non-redundant and transitions into the tiling solution 
at high C, where each gt independently covers a subrange of 
concentrations for the input c. The last panel shows the op- 
timal values for the dissociation constants, Ki, of all 5 genes, 
as a function of C = Cmax/co- 



the function of removing redundancy in the neural code 
due to correlations in the stimulus and receptive field 
overlap. The function of such connections in genetic regu- 
lation is to decrease the redundancy in the outputs as well 
- with no interconnections in the tiling solution, when the 
gene with the highest is saturated and fully active, 
we know that all the other genes are also fully on and 
saturated: they are therefore providing redundant infor- 
mation. In other words, when there is no interactions, the 
only patterns of activation[86] are 000, 001, Oil, 111 for a 
case of 3 genes. Patterns such as 010 or 110 cannot be 
accessed if there is no lateral interactions. If they exist, 
however, these patterns can be generated and they can 
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FIG. 16; The optimal input/output relations for two genes 
ffi , <?2 regulated by a common input c, with cross-regulatory 
feed-forward interactions and Hill model of regulatory func- 
tions (top) and the MWC model (bottom). In case the ac- 
tivating arrow is allowed between gi and (72, the optimal so- 
lution (gray lines, A1A2 — A12) is not different from a non- 
interacting system, where c independently regulates gi and g2 : 
both the input/output curves as well as the information trans- 
mission values are the same. In the case where c activates gi 
and g2, but gi can repress g2, qualitatively new input/output 
shapes can be optimal (black lines, A1A2 — R\2)- Here, the 
combinatorial regulation of g2 by gi and c makes the ap- 
parent input /output relation 32 (c) behave non-monotonically 
and produce a stripe. 



encode additional useful information about their input c, 
increasing information transmission. 

The results we summarized so far are for the case of a 
functional model of regulation desribed by combinatorial 
Hill regulation with AND logic [Eq (16)]. As we men- 
tioned in Section [iTj other phenomelogical models such 
as the MWC model can be used to described the activa- 



tion of one gene by many transcription factors. In the 
lower panel of Fig[l6]we plot the optimal network for the 
same two interacting genes as in the upper panel, but 
we describe their regulation via a MWC model instead 
of a combinatorial Hill model. Although the wiring of 
the optimal networks in the two cases of regulatory mod- 
els is the same, the input/output relations for the case 
of MWC regulation exhibit both genes in the "on" state 
for large concentrations of input c. In the simplified pic- 
ture where we view the genes as being "on" and "off" 
only, this "on" /"on" state affords the MWC model an- 
other distinguishable state that encodes the input, and 
thus results in MWC model achieving a higher informa- 
tion capacity in comparison to the Hill model. The non- 
interacting solutions for the two regulatory functions are 
the same. 

At this point we would like to stress again what are the 
assumptions and what are the results of the approach 
presented here. We assume that (i) the information is 
optimized, (ii) the small-noise approximation is applica- 
ble, (iii) the input/output functions come from a fixed 
family (of, for instance. Hill or MWC regulatory func- 
tions), and (iv) the form of the noise is fixed to have 
an input and an output component, as in Eqs ( |43l76l ) - 
the last assumption introduces a single tunable dimen- 
sionless parameter, C = Cmax/co, on which the optimal 
solutions depend. What we find from the optimization 
calculation is whether a given gene is regulated or not by 
a given input (an optimized result of = or = 
means there is no interaction, even if we allowed for one, 
from gene i to gene j), whether the interaction is activat- 
ing or repressing (sign of n), and specifically what is its 
strength (values of K and n). Therefore we learn both 
the topology of the optimal network and the directions 
and strengths of the "arrows" in its wiring diagram. 

Our understanding of information transmission in 
transcriptional networks is far from complete. Never- 
theless, the richness of solutions and network topologies 
that emerges from a single optimization principle in a 
one-parameter (C) problem is very encouraging, espe- 
cially since we already observe a qualitative match to 
the stripe-like solutions in early Drosophila development. 
Further efforts need to be invested into understanding 
multi-stability, feedback loops and autoregulation, and 
in the incorporation of other biologically realistic details. 
Hopefully, this (or some other) design principle will in 
the future enable us to understand the wiring of biologi- 
cal networks and derive it from a mathematical measure 
of their function, rather than reconstructing it back from 
painstaking molecular disassembly of the network into its 
constituent parts. 



C. Beyond the small noise approximation 

The results presented in the previous sections were 
computed in the small noise approximation, i.e. the as- 
sumption that the system is well-described by the set of 
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mean input /output relations along with a (small) Gaus- 
sian noise envelope. However, in real networks the small 
noise approximation might not be applicable for two rea- 
sons: first, the noise might be Gaussian in form but not 
small compared to the mean, and second, the noise might 
not have a Gaussian distribution. The first possibility 
was raised already in Refs 151, , where we showed that 
in the real bcd/hb system the small- noise result and the 
exact result are similar, but the small-noise approxima- 
tion underestimates the capacity by about ~ 25%. In 
this section we review more abstract work which ana- 
lyzes the general properties of information transmission 
in regulatory elements, without making any assumptions 
about the form of the noise. 

We start by writing down the full stochastic model for 
the regulatory circuit of interest using a master equation, 
which will be a generalization of Eq ([5| to more than one 
gene. The information transmission / between the input 
and output of a circuit is then computed directly from 
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the definition in Eq (56), subject to the constraint on 



the mean total number of produced signaling molecules. 
Specifically, we are maximizing C = I — ^J2i=i{''^e) / L, 
where L is the number of signaling protein species, n£ are 
their counts and A is the Lagrange multiplier enforcing 
the constraint. An example in Fig [T7| shows the capac- 
ity results for a two-step regulatory cascade. In general, 
with this approach the computational difficulty lies in 
solving for the steady state probability distribution of a 
master equation. Since the goal is to optimize the infor- 
mation transmission (which requires many evaluations of 
the steady state distribution for different choices of pa- 
rameters and inputs) , one must have a fast and accurate 
method for solving the master equation. For this pur- 
pose we derived the spectral method [101 HI] , which is 
reviewed in detail in Ref [T7]. 

In Section IIVBI we found that information transmis- 
sion is increased if the system is able to access distin- 
guishable gene expression states. Consequently we won- 
dered if there exists a scenario where an optimal network 
would transform a unimodal input into a bimodal output. 
We specifically considered a cascade of length L, where at 
each step the regulation was taken to be stepwise: there 
were two protein production rates, one above the thresh- 
old, g+, and one below the threshold, q_. We found 
that for large enough jumps in regulation, S = \q+ — q^\, 
the optimal way to transmit information in a cascade 
of L > 3 is to generate a bimodal output. For a fixed 
value of the jump parameter, cascades of repressed genes 
transmit the same amount of information as activated 
genes. However, a cascade with repressed genes needs to 
produce more proteins to achieve the same capacity as 
a cascade with activated genes (see Fig. 17). The differ- 
ence is most significant when we restrict the total mean 
number of available proteins to be small. In this regime 
of constrained resources, the master equation approach is 
especially useful. We also observed how information de- 
creases with every step of the cascade, as expected from 
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FIG. 17: The capacity of a cascade of length L = 3 as a func- 
tion of the mean number of proteins produced in the cascade, 
(n). A comparison of cascades where in each step the down- 
stream gene is activated (crosses) and repressed (circles). At 
a fixed (and low) total protein number, the cascade with pos- 
itive regulation yields higher capacity than the cascade with 
negative regulation. These results are derived using the mas- 
ter equation model, with threshold (positive or negative) reg- 
ulation in each step of the cascade. The input to the cascade 
is assumed to be Poisson with an optimized mean. 



The issue of how a limit on the number of available 
signaling molecules (signaling cost) affects the choice of 
optimal regulatory functions appeared already in Sec- 
tion IV B| in the form of C, the maximal concentration 
of input molecules. In Ref [5T] we investigated a detailed 
model of gene regulation, which explicitly considered two 
gene expression states, at a basal and enhanced expres- 
sion level. Surprisingly, we found that when the gene 
expression state changes on slow timescales, the infor- 
mation transmitted is larger than if the gene expression 
state is equilibrated (see Fig. 18). This serves as yet an- 
other example of how capacity can be increased without 
increasing the cost, by making a clever use of the regu- 
latory mechanisms (as was the case with the "lateral in- 
hibition" ) . The slow change in the gene expression state 
generates a bimodal output distribution, as opposed to a 
unimodal distribution in the equilibrated case. 



the data processing inequality presented in Section III B 



D. Beyond the static and steady state assumptions 

The discussion so far has been limited to the steady 
state solutions and signals that are static or that vary 
in time slower than all other time-scales in the prob- 
lem. One of the remaining theoretical challenges is to 
understand information transmission in a fully dynami- 
cal, non-linear system. In this case one might focus on 
various quantities, for instance the instantaneous or mean 
information rate, or in the total information transmitted 
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FIG. 18: A) A regulatory scheme for a two-step cascade, 
where external input controls the level of gene 1 (with protein 
count n), which in turn controls the production of protein 2 
(with protein count m). Both genes have two states: an acti- 
vated and a basal level of expression. B) and C) show the two 
optimal distributions Pnm in two regimes of circuit operation: 
the unimodal distribution in B is optimal for fast switching 
between the two expression states, and the bimodal distribu- 
tion in C is optimal for slow switching between the two states. 
In the limit of slow switching between the two gene expression 
states the optimal circuit transmits more information between 
the input and output than in the limit of fast switching. 



as a function of time. In this section we briefly mention 
recent approaches that explored some of these quantities 
in certain tractable limits. 

As mentioned in the Introduction, a prominent exam- 
ple of time-varying signals occurs in the oscillatory be- 
havior of TFs involved in the cell cycle. In this system, 
one can ask about the ability of the genes downstream 
from the cell cycle oscillator to tell the phase of the sig- 
nal, which is assumed to change in a sinusoidal fashion, 
f{t) — g + asm{ujt). In Ref ^2] we considered a two-gene 
circuit where the expression level of the the first gene is 
f{t); its products in turn regulate the second (output) 
gene via a threshold function. The information between 
the output protein count m and the phase can be cal- 
culated knowing the oscillatory steady state solution for 
the output probability distribution: 



I{(j);m) = 
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(87) 



where P(0) = l/27r and = d0P(m|0)P(0). For 
threshold regulation, the information is optimized for 
a certain non-zero value of the driving frequency. For 
infinitely slow oscillations, the system discriminates be- 
tween three states in output expression: high, low and 
intermediate. In the limit of very fast oscillations, all 
the expression states are averaged together and become 
indistinguishable, forcing the information content to de- 



crease. In an intermediate regime, two new intermediate 
states appear in addition to the high and low state. The 
output is now able to encode whether the signal is in- 
creasing or decreasing. We again find that the ability 
of the output to discriminate between different states al- 
lows for the output protein concentration to carry more 
information about the phase of the oscillatory signal. We 
note that in this case the information between the output 
count m and the phase can be larger than between the 
upstream protein count and the phase. The data pro- 
cessing inequality does not hold, because the Markovian 
steady-state assumptions used to derive it are not valid 
in this case. 

Lastly, we turn to an approximate result in the case 
of a fully time-dependent solution. In Section |IIIB| we 
derived the result for a Gaussian channel, where the in- 
formation capacity was determined by the signal-to-noise 
ratio. This result can be generalized to a time-dependent 
stationary system by moving to the Fourier space and 
observing that the information capacity is now simply 
an integral of a frequency-dependent signal-to-noise ra- 
tio across all frequency channels [5B]. While this re- 
sult has been in use in engineering and neuroscience for 
quite some time, it has been introduced in the context of 
gene regulatory networks in a series of pedagogical papers 
[Sni [Ml [5S] . We briefly outline the derivation presented in 
these papers here, while refering the reader to the original 
manuscripts for detailed discussion and limiting cases. 

In general the mutual information between a time de- 
pendent input trajectory, c(i), and output trajectory, 
g{t)^ is given by a generalization of Eq (56 1: 
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In order to evaluate the integrals one needs to consider 
all possible paths, which makes the problem extremely 
hard. One possible approach is to assume that the input, 
the output, as well as the (additive) noise in the channel 
jointly obey Gaussian statistics. Defining the deviations 
of input and output from their respective means, 5c[t) — 
c(t) — c and 5g{t) — g{t) — g, we can sample the trajec- 
tories of the deviations at N successive, evenly spaced 
points in time, {t - (TV - 1) A, i - (iV - 2) A, . . . , t - A, i}, 
pack them into vectors 5c{t) and (5g(t), and write down 
the joint probability distribution for these vectors: 
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where the time dependent covariance matrix has the 
form: 



C(i,i') = 



cc ^cg 
eg ^gg 



(90) 
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The elements of the covariance matrix are the correla- 
tions between having, e.g., a concentration of input 6c{t) 
at time t and a concentration of output 6g(t') and time 
t', Ccg{t,t') = {5c{t)5g{t')). Each submatrix C^^ of C 
with /i, = {c, g\ is of dimension N x N. 

We note that the assumption of joint gaussianity of in- 
puts and outputs is much stronger than the small noise 
approximation we used in the steady state analysis of 
Section [TVA[ There we only assumed that the noise pro- 
file cTg (c) is locally a Gaussian at every c around the mean 
input-output relation g(c) which itself could be arbitrar- 
ily nonlinear. Here, we are making a stronger approx- 
imation that across the whole dynamic range of inputs 
and outputs and across time the distribution is jointly 
Gaussian; as a result we gain the ability to consider time- 
dependent signals. Since we are dealing with Gaussian 
distributions, the entropy is proportional to the loga- 
rithm of the variance, S — log (27re)^|C|, as we showed 



following Eq ( 49 ) . Plugging in Eq ( 89 ) into Eq 
obtains: 



/(c;g) = S[P{Sc) 
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When the conditional probability P{g\c) is not Gaussian, 
the Gaussian channel approximation remains a lower 
bound on the amount of information that can be trans- 
mitted between the input and the output. This calcula- 
tion therefore remains a very useful first step to gaining 
intuition about the properties of any system. 

In the context of time-varying signals, the amount of 
information transmitted is proportional to the duration 
of signal transmission. The quantity we really are inter- 
ested in is the average information rate, which we define 
as: 



R (c; g) = lim 

T— i-oo 
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(92) 



The information rate has units of [bits/sec]. Since in 
gene regulation we most often deal with continuous sig- 
nals, such as concentrations which fluctuate in time, it 
is convenient to continue the analysis in the Fourier do- 
main. We are interested in a time-averaged information 
rate, and in the T — > oo limit we can restrict our analysis 
to stationary signals C{t,t') = C{t — t'). We can thus 
rewrite the covariances in terms of their Fourier trans- 
forms, the power spectra, e.g. 



d{t~t') C,g{t^t')e'^^'-''\ 



(93) 



Next, we rewrite Eq (91) in Fourier basis and calculate 



the information rate from Eq ( 92 ) [57] : 



1 r°° 



\Scciu})\\Sgg{U})\ 



(94) 



The power spectrum of the output can be written in 
terms of the power spectra of the noise, N{uj), and the 
transmitted input, T,{uj) — \Pcg{uj)\'^ / Pcduj): 
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We can now rewrite the rate of information transmission 
in terms of the signal-to-noise ratio, I]{uj) /N{uj): 
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This result is a generalization of the Gaussian channel to 
time dependent stationary signals. If there were no fre- 
quency dependence in the signal-to-noise ratio, we would 
recover the result of Eq (60). In case of stationarity. 



Fourier components are statistically independent and the 
total information rate is the summation across all fre- 
quency bands. Using that fact, we can motivate the time 
dependent result by taking the total information trans- 
mitted to be a sum of all the Fourier components, /„ 
transmitted independently [40) . 
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Taking the limit of continuous frequencies we arrive at 
the known form for the rate of information transmission 
in Eq ( 96 ) . In this approximation, a signal at a given 



frequency will only trigger a response at that same fre- 
quency, i.e. there is no "frequency mixing." This result 
allows us to optimally choose the signal power spectrum 
so that we maximize the rate of information transmission 
through the system with a given noise spectrum N{uj). 
The answer [see Ref. (301 for a derivation] is for the sig- 
nal to be complementary the noise, that is, chosen such 
that S(a;) -I- N{uj) = const ( "waterfilling" ) . For a finite 
signal power, this will make the combined spectrum in 
an optimal case look flat and unstructured. 

This framework has, up to now, been completely gen- 
eral. The application to biological signaling consists of 
computing the covariance structure of inputs and the out- 



puts that enters Eq (91) for the specific system under 



study. One approach, presented in the work of Tostevin 
and ten Wolde, is to calculate them from the linear noise 
approximation (50! . In the linear noise approximation the 
dynamical equations for the system are linearized around 
their operating point, to yield a linear system with a 
Gaussian additive noise exposed to Gaussian inputs (by 
assumption), so that the statistics of inputs and outputs 
will be jointly Gaussian. One can then calculate all the 
covariances in the system, and finally compute the infor- 
mation rate, as described. For a discussion of the validity 
of the linear noise approximation see Ref jT7] . 

An important result demonstrated by Tostevin and ten 
Wolde is that a gene regulatory circuit for which the in- 
stantaneous information is zero can have a large infor- 
mation rate for the input/output trajectories, and vice 
versa. An example of such a system is the irreversible 
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conversion of one molecule into another. Therefore, a 
gene circuit may not be transmitting information is its 
stationary state, yet it could transmit information in an 
oscillatory state. The authors also discuss that when 
gain-to-noise ratio depends on the statistics of the in- 
put, the optimal input power spectrum no longer needs 
to obey the simple waterfilling rule. Later, de Ronde and 
co-workers |55j have studied systematically information 
transmission in short regulatory motifs with and without 
feedback. They explore the following interesting signal- 
ing cascades: with and without positive or negative feed- 
back, and adding feedback such that it either acts at from 
the output node or upstream in the circuit. They showed 
that negative feedback from the output onto intermediate 
stages of the cascade is not a good strategy for transmit- 
ting information. Similarly, positive auto-regulation of 
the output node does not increase information transmis- 
sion, while positive auto-regulation of an intermediary 
node does. The effect of feedback between intermediate 
nodes depends on the type of feedback: negative feed- 
back increases transmission fidelity at high frequencies 
(but across the whole bandwidth, the gain-to-noise ra- 
tio decreases overall), while positive feedback increases 
gain-to-noise at low frequencies. This detailed study lead 
the authors to claim that, in general, feedback, includ- 
ing auto-regulation, can increase the circuits ability to 
transmit information between the input and output, but 
only if these forms of regulation occur upstream of the 
dominant source of noise. 

Gene regulation is a highly nonlinear process for which 
the linear noise approximation can fail. A simple signa- 
ture that invalidates the linear noise approximation is 
a bimodal distribution of either inputs or outputs (as 
in the case of bcd/hb system), where the mean will be 
a poor representation of either of the two states, and 
the variance will be badly approximated from the linear 
expansion around the mean. On the other hand, there 
are signaling networks that might operate close to the 
linearized regime, e.g. the chemotaxis network of Es- 
cherichia coli. As noted in Ref ^5(T, the linear noise 
approximation is a natural choice for information rate 
calculation in the Gaussian approximation, because lin- 
earization of the dynamical system also decouples fre- 
quency components. We refer the reader to the original 
work in Refs [SDl [SH [SS] for derivation of covariance ma- 
trices and information rates for common motifs in regula- 
tory networks. In gene regulatory networks, covariances 
can have complex forms and the output power spectrum 
can depend on both the statistics of the input and the 
noise fSBl. 



enormously successful in sensory neuroscience, where it is 
known as the "efficient coding" principle [56j . Assuming 
that the statistics of the input signal are fixed by the envi- 
ronment, the neural processing mechanisms have evolved 
to transmit as much of the input information as possible 
through noisy neuronal links of limited bandwidth. This 
has led to a number of predictions regarding the structure 
of receptive fields [5^ , design of the retinal mosaic [551 - 
\ET\ , and properties color vision [S2 ES] . The same princi- 
ple has also been invoked to predict that neurons should 
dynamically adapt to the modulations in stimulus statis- 
tics. Impressively, experiments have confirmed that the 
neurons in fly vision really do scale their input/output 
relations so as to match their dynamic range to the vari- 
ance of the stimulus and thus increase information flow 
[BH [BS] . Recent work has also examined the nature of 
optimal population coding in neural networks, exploring 
the tradeoffs between fighting the noise through positive 
couplings and reducing redundancy through lateral inhi- 
bition; it is interesting to note that in some parameter 
regimes optimal codes again turn out to be locally sta- 
ble and distinguishable states of the output [BH]. Ad- 
vances have also been made in constraining information 
encoding by network elements with experimental mea- 
surements [67]. These parallels between genetic regula- 
tion and neuroscience certainly motivate us in thinking 
that the same set of basic principles might underlie effi- 
cient biological information processing. 

Other applications of information theory to cell reg- 
ulation have been developed, which consider bounds on 
information transmission in biological systems, such as 
finding the minimum rate at which information must be 
transmitted in the system to ensure the readout of the 
signal remains within a fixed value of the signal - these 
bounds have much to do with the approach that views 
cells and organisms as trying to "decoding" noisy envi- 
ronmental signals and making optimal decisions based on 
these data [68] ■ Information theory has been used to dis- 
cuss chemotaxis [MllZ^i i-c navigation on the basis of 
noisy inputs. 

A recent paper has also raised an interesting topic of 
learning about biological systems from the way they sys- 
tematically deviate from the optimality predictions [7T| . 

For a (nonexhaustive) list of other topics where in- 
formation theory has also been applied in biology, we 
note its use in analyzing evolution of organisms in un- 
known environments |72H74j , or considering the capacity 
of genomes [75] . It has been discussed more generally in 
the context of evolution [75] . 



V. RELATED WORK 

We focussed on one specific approach to information 
processing by gene networks, where we optimize the form 
of the regulatory function to maximize the information 
between the input and output. This approach has been 



VI. DISCUSSION 

Biology presents an interesting challenge to physicists: 
many symmetries and simplifications applicable in or- 
dered (but non-animate) systems are absent in biology, 
and this complexity of life can be intimidating. On the 
other hand, biological systems have evolved for function. 
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and as we make progress in formalizing this notion math- 
ematically, we hope to gain new insights and predictive 
power. 

In this review we attempted to summarize some of the 
progress made over the last few years in using informa- 
tion transmission as a possible measure of function for 
gene regulatory networks. Our goal was to show that 
this is a powerful approach which allows one to calculate 
properties of gene circuits that can be directly compared 
to measurements. One of the interesting aspects we tried 
to illustrate in this review is how microscopic features of 
gene regulation, i.e. the nature of computations / signal 
integration at the promoter, and the form of the noise 
in gene regulation, influence the ability of the network 
to transmit information and affect the pattern of opti- 
mal solutions (e.g. stripes of expression in the gap gene 
network). To physicists this is an interesting lesson, in 
that some features at the macroscopic level can be sen- 
sitive to certain (hopefully not all!) microscopic details 
of regiilation. Nevertheless, the calculations and ideas 
presented in this review show that attempts at the inter- 
face of physics and biology aimed at understanding how 
physical constraints shape circuit structure and function 
are proving fruitful. 

We also described recent experiments that discuss a 
specific gap gene circuit, active during early development 
of the fly embryo, which appears to function close to the 
limits imposed by noise in gene expression. We empha- 
size again that not all gene regulatory networks are likely 
to be optimized for information transmission, but in the 
case of early development, it does seem possible that the 
formal notion of information reflects faithfully the devel- 
opmental "positional information" that enables the or- 



ganism to build up complex structures. 

Despite progress in understanding information trans- 
mission in gene regulation, a lot of work still remains to 
be done. The biggest formal challenge is to construct 
a general framework for computing information trans- 
mission in time-dependent non-linear networks. The 
second important challenge is to understand how infor- 
mation transmission functions in spatially resolved sys- 
tems where constituent chemicals are not well-mixed, and 
transport phenomena play an important role. Third, as 
a challenge to both theory and experiment, we are look- 
ing for a complete derivation of an optimal information 
transmission network that includes all relevant regulatory 
eff'ects, and compare it to both the experimentally de- 
termined network topology and the experimentally mea- 
sured information rates. Lastly, we would like to encour- 
age further work that tries to link information transmis- 
sion to other measures of network fimction, both as a 
numerical optimization problem and in models of evolu- 
tion under the assumed network function. It seems likely 
- especially due to the assumption-free nature of infor- 
mation measures - that different measures of function 
could produce consistent results. 
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